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Abstract 

A large offspring number diploid biparental niultilocus population model of Moran type is 
our object of study. At each timestep, a pair of diploid individuals drawn uniformly at random 
contribute offspring to the population. The number of offspring can be large relative to the 
total population size. Similar 'heavily skewed' reproduction mechanisms have been considered 
by various authors recently, cf. e.g. Eldon and Wakeley (2006, 2008), and reviewed by Hedgecock 
and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to 
each diploid offspring, and hence ancestral lineages can only coalesce when in distinct individ- 
uals. A separation of timescales phenomenon is thus observed. A result of Mohle (1998) is 
extended to obtain convergence of the ancestral process to an ancestral recombination graph 
necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral 
recombination graph is obtained as a special case of our model when the parents contribute only 
one offspring to the population each time. 

Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal 
genealogy at each locus admits simultaneous multiple mergers in up to four groups, and dif- 
ferent loci remain substantially correlated even as the recombination rate grows large. Thus, 
genealogies for loci far apart on the same chromosome remain correlated. Correlation in coa- 
lescence times for two loci is derived and shown to be a function of the coalescence parameters 
of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage 
disequilibrium are shown to be functions of the reproduction parameters of our model, in addi- 
tion to the recombination rate. Correlations in ratios of coalescence times between loci can be 
high, even when the recombination rate is high and sample size is large, in large offspring num- 
ber populations, as suggested by simulations, hinting at how to distinguish between different 
population models. 



Diploidy, in which each offspring receives two sets of chromosomes, one from each of two distinct 
diploid parents, is fairly common among natural populations. Mathematical models in population 
genetics tend to assume, however, that all individuals in a population are haploid, simplifying the 
mathematics. Mendel's Laws describe the mechanism of inheritance as composed of two main 
steps, equal segregation (First Law), and independent assortment (Second Law). The First Law 
proclaims gametes are haploid, i.e. carry only one of each pair of homologous chromosomes. Most 
models in population genetics are thus models of chromosomes, or gene copies. Mendel's Second 
Law proclaims independent assortment of alleles at different genes, or loci, into gametes. Linkage of 
alleles on chromosomes, resulting in non-random association of alleles at different loci into gametes, 
is of course an important exception to the Second Law. 

Coalescent processes ( [Kingman] |1982a|b] |Hudson| |1983b[ |Tajima[ [19831 ) describe the ancestral 
relations of chromosomes (or gene copies) drawn from a natural population. The coalescent was 



initially derived from a CANNINGS ( 1974 ) haploid exchangeable population model. Related ancestral 



processes take into account population structure (NOTOHARA 1990 Herbots 1997), selection 



(Krone and Neuhauser 1997, Neuhauser and Krone, 1997 Etheridge et ai, 2010), and 



recombination between linked loci (HUDSON 1983a, GRIFFITHS 1991 GRIFFITHS and MARJORAM 



1997). The coalescent has proved to be an important advance in theoretical population genetics. 



and a valuable tool for inference of evolutionary histories of populations. 



Ancestral recombination graphs (ARC) ([HUDSON 1983a GRIFFITHS 1991, GRIFFITHS and 



Marjoram, 1997) trace ancestral lineages of gene copies at linked loci, in which linkage is broken 



up by recombination. An ARC is a branching-coalescing graph, in which recombination leads to 
branching of ancestral chromosomes, and coalescence to segments rejoining. Coalescence events in 
an ARC may not lead to coalescence of gene copies at individual loci. An example ARC for two 



linked loci is given below, labelled as ARG{1), with notation borrowed from DURRETT (2002 1. The 



labels a and b refer to the two alleles (types) at locus 1 and 2, respectively. A single chromosome 
with two linked alleles is denoted by (ab), while chromosomes carrying ancestral alleles at only one 
locus are denoted (a) and (6). When coalescence occurs at either locus, the number of alleles at the 
corresponding locus is reduced by one. The absorbing state, either (ab) or (a)(6), is reached when 
alleles at both loci have coalesced. 



ARG{1) : (ab)(ab) A (a){b){ab) A (a6)(b) 
A (a)(6)(b) A (a)(6) 



ARG{2) : (a6)(a6) A (a)(6)(a6) A (a)(6)(a)(6) 
A (a)(6) 

In ARG{1), the first transition is a recombination, denoted by — t-, followed by a coalescence (—)•), in 
which the two alleles at locus 1 coalesce. Graph ARG{1) serves to illustrate two important concepts 
we will be concerned with, namely correlation in coalescence times between alleles at different loci, 
and the restriction to binary mergers of ancestral lineages. 

Correlation in coalescence times between types at different loci follows from linkage. Alleles at 
different loci can become associated due to a variety of factors, including changes in population size, 
natural selection, and population structure. Within-generation fecundity variance polymorphism 
induces correlation between a neutral locus and the locus associated with the fecundity variance 



(Taylor 2009). Sweepstake-style reproduction (Hedgecock et al. 1982 Hedgecock 1994 



Beckenbach 1994 AviSE et al. 1988 Palumbi and Wilson, 1990 Arnason,2004 Hedgecock 



and PUDOVKIN 2011 ) , in which few individuals produce most of the offspring, has also been shown to 



induce correlation in coalescence times between loci (Eldon and Wakeley 2008). Understanding 



genome-wide correlations in coalescence times becomes ever more important as multi-loci genetic 
data becomes ubiquitous. 

The ARG exemplified by ARG{1) is characterised by admitting only binary mergers of ances- 
tral lineages, i.e. exactly two lineages coalesce in each coalescence event. The restriction to binary 
mergers follows from bounds on the underlying offspring distribution, in which the probability of 
large offspring numbers becomes negligible in a large population ( KiNGM AN 1982a| [b I . Sweepstake- 
style reproduction, in which few individuals contribute very many offspring to the population, 
have been suggested to explain the 'shallow' gene genealogy observed for many marine organisms 



(Hedgecock et al. 1982 Hedgecock 1994, Avise et al. 1988 Palumbi and Wilson 1990 



Beckenbach 



1994 



Arnason 



2004 



Hedgecock and Pudovkin 



2011). Large offspring number 



models are models of extremely high variance in individual reproductive output. Namely, Individ- 



uals can have very many offspring, or up to tlie order of the population size with non-neghgible 



probabihty ( ScHWEiNSBERG 2003 Eldon and Wakeley 2006 Sargsyan and Wakeley 2008 



Sagitov 2003 BiRKNER and Blath, 2009). Such models do predict shallow gene genealogies, 



and can be shown to give better fit to genetic data obtained from Atlantic cod (Arnason 2004) 



than the Kingman coalescent (BiRKNER and Blath, 2008 BiRKNER et al. 2011 Eldon 2011 



SteinrCcken et al. 2012). Different large offspring number models will no doubt be appropriate 



for different populations, and the identification of large offspring number population models for each 
population is an open problem. For the sake of simplicity and mathematical tractability, the simple 



large offspring number model considered by Eldon and WAKELEY (2006) will be adapted to our 
situation. 

The coalescent processes derived from large offspring number models belong to a large class of 



multiple merger coalescent processes introduced by DONNELLY and KURTZ (1999), PiTMAN (1999) 



and Sagitov (1999). Multiple merger coalescent processes (A-coalescents), as the name implies. 



admit multiple mergers of ancestral lineages in each coalescence event, in which any number of active 
ancestral lineages can coalesce, and at most one such merger occurs each time. In simultaneous 



multiple merger coalescent processes (MOHLE and Sagitov 2001 SCHWEINSBERG 2000a), any 



number of multiple mergers can occur each time, i.e. distinct groups of active ancestral lineages 
can coalesce each time. The ancestral recombination graph derived from our diploid large offspring 
number model admits simultaneous multiple mergers of ancestral lineages, as exemplified in ARG{2). 
The last transition in ARG{2) is a simultaneous multiple merger, in which the two types at each 
locus coalesce to separate ancestral chromosomes. 

In order to investigate correlations in coalescence times among loci due to skewed offspring 
distribution, we formally derive an ancestral recombination graph, or a coalescent process for many 
linked loci, from our diploid large offspring number model. The key to the proof of convergence to an 
ancestral recombination graph from our diploid model lies in resolving the separation of timescales 
phenomenon we observe. Following Mendel's Laws, the two chromosomes of an offspring come from 
distinct diploid parents. Chromosomes can therefore only coalesce when in distinct individuals. The 
ancestral process will consist of two phases, a dispersion phase occurring on a 'fast' timescale, and 
a coalescence and recombination phase occurring on a 'slow' timescale. In the dispersion phase, 
chromosomes paired together in diploid individuals disperse into distinct individuals. Coalescence 



and recombination will only occur on the slow timescale. Similar separation of timescales issues 
arise in models of populations structured into infinitely many subpopulations (demes) ([TAYLOR and 



Veber, 2009). When viewing the diploid individuals in our model as 'demes', our scenario departs 



from those describing structured populations by allowing only active ancestral lineages residing in 



separate 'demes' to coalesce. A simple extension of a result of MOHLE (1998) yields convergence in 
our case. 

The limiting process we formally obtain is an ancestral recombination graph for many loci 
admitting simultaneous multiple mergers of ancestral chromosomes (lineages). In simultaneous 
multiple merger coalescent processes, so-called H-coalescents, different groups of active ancestral 
lineages can coalesce to different ancestors at the same time. Such coalescent processes were first 



studied as more abstract mathematical objects by SCHWEINSBERG (2000a I, and derived from general 



single-locus population models by several authors (MoHLE and Sagitov 2001 Sagitov, 2003 



Sargsyan and Wakeley 2008 Birkner et al. 2009). A S-coalescent with necessarily up to 



quadruple simultaneous multiple mergers arises at each marginal locus (ie. considering each locus 
separately) in our model, since four parental chromosomes are involved in each reproduction event. 
This structure is intrinsically owed to our diploidy assumptions. 

Formulas for the correlation in coalescence times between two alleles at two loci are obtained 
using our ancestral recombination graph (ARG). As predicted by J.E. Taylor (personal communica- 
tion), these correlations will not necessarily be small even for loci separated by high recombination 
rate. This is a novel effect not visible in classical models. The correlation structure will of course 
depend on the underlying coalescent parameters introduced by the large offspring number model 
we adopt. An approximation of the expected value of the statistics r'^, commonly used to quantify 
linkage disequilibrium, is also investigated using our ARG. In addition, we employ our ARG to 
investigate correlations in ratios of coalescence times between loci for samples larger than two at 
each locus, using simulations. 

A diploid population model with multilocus recombination and skewed offspring 
distribution 

The forward population model 

Consider a population consisting ofA^ G N = {1,2,...} diploid individuals, meaning that each 
individual contains two chromosomes. Each chromosome is structured into L G N loci. We assume 
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Figure 1: Illustration of 'small' and 'large' reproduction events without recombination. The dotted 
arrows indicate the copying of parental chromosomes into offspring chromosomes. The solid arrows 
indicate individuals that persist. 



Moran-type dynamics: At each timestep ('generation'), either a small or a large reproduction event 
occurs. In a small reproduction event, a single individual chosen uniformly at random from the 
population dies, and two other distinct individuals are chosen as parents. A diploid offspring is then 
formed by choosing one chromosome from each parent (see Figure nl). The parents always persist. 
A small reproduction event occurs with probability 1 — e^, in which e^ G (0,1) depends on N. 
In a large reproduction event, a fraction ^ G (0, 1) of the population perishes, meaning that [il^N\ 
individuals die ([xj for x > denotes the largest integer smaller than x). Two distinct individuals 
are then chosen uniformly from the remaining N — [il^N\ individuals to act as parents of [V'A^J 
offspring, and each offspring is formed independently by choosing one (potentially recombined) 
chromosome from each parent (see Figure 1). The population size always stays constant at A^ 
diploid individuals. Individuals that neither reproduce nor die simply persist. 

Given the two parents, genetic types of the offspring individuals will then be obtained as follows. 
Each parent generates a large number of potential offspring chromosomes, of which a fraction 1 — r^ 
are exact copies of the original parental chromosomes, and a fraction r^ are recombinants. Each 
chromosome is structured into L loci. Recombination occurs only between loci, and never within. If 
recombination between a pair of chromosomes in a parent occurs between loci i and i+1 £ {1, . . . , L} 



(where we say that X G {1, . . . , L — 1} is the crossover point), the two chromosomes exchange types 
at all loci from ^ + 1 to L. Only one crossover point is allowed in each recombination event. Let 
r^P denote the probability of recombination between loci i and i + 1 (i.e., the probability that the 
potential crossover point X equals t). An offspring chromosome is a recombinant with probability 
r^ = r^^> + • • • + r^^'"^- Given that recombination happens, we thus have 

Each pair of recombined chromosomes is formed independently of all other pairs. From this large 
pool of chromosomes, each new offspring is randomly assigned (independently of all other offspring 
in the case of a large reproduction event), one potentially recombined chromosome generated by 
each parent. In addition, the reproduction mechanism in different generations is assumed to be 
independent. 

Ancestral relationships - notation 

Now we switch from the forward population model to its ancestral process, running backwards in 
time. Our sample will consist of n G {1, . . . , 2A^} chromosomes, each subdivided into L loci. Hence, 
we need to keep track of the ancestry of nL segments (types/alleles). This implies that the different 
segments could end up on up to nL distinct chromosomes in nL distinct ancestral individuals. The 
required notation will now be introduced, and our discourse will therefore necessarily become a 
little bit technical. However, we believe that a precise description of the objects we are working 
with is essential. The key to understand our notation is that we are working with enumerated 
chromosomes, and ordered loci on chromosomes. 

At present (that is, time step m = 0), assume that we consider an even number n of chromosomes 
carried by n/2 individuals. The chromosomes are enumerated from 1 to n, attaching consecutive 
numbers to chromosomes found in the same individual. Our ancestral process will keep track of 
the chromosomal ancestral information, that is, which locus is ancestral to which set of sampled 
chromosomes. That is, in each generation m £ Nq (backward in time), we will record all chromo- 
somes which are active in the sense that they carry at least one locus which is ancestral to the same 
locus of at least one chromosome in generation 0. Denote the number of active chromosomes in 
generation m G No by f3{m) G N. The number /3(m) of active chromosomes can both increase, due 
to recombination, and decrease, due to coalescence, going back in time. 

9 



Now we explain our notation for the loci. For each chromosome j £ [n] := {1, . . . ,n}, denote 
by Li^ [m) locus i G [L] on chromosome j at time m. The subsets L^ (m) of [n] contain all the 
numbers of chromosomes at present (time step 0) to which locus i on active chromosome number j 
at time step m is ancestral. With this convention, and for each m G N and i G [L], the collection 

{4^')(m),i = l,...,/3(m)} 

which describes the configuration of segments (i.e. which have coalesced and which have not) at 
locus i at time m, is a partition of [n], i.e. 

lJ-''^ (m) n lP (m) = for j / j; 



and 

I3{m) 

(J) 



u 



,^ ^m) = [n\. 



Thus, with our notation we can correctly describe the configuration of segments among chromosomes 
at any given time. By C"' (m) we denote chromosome number j at time m. At time m = 0, 

C(^)(0) := {h?{0), . . . ,L?(0)} := {{j}, . . . , {j}}. 

For m > 0, consider the j'-th active chromosome at generation m, where j G [/3(r7T,)]. The 
corresponding ancestral information at generation m is encoded via an ordered list of subsets of [n], 
setting 

C7(i)(m) := {L,[^\m), . . . ,l.f {m)] , 

^ -■ (1) 

h'f\m)c[n], ee[L]. 

Chromosomes are carried by diploid individuals. Keeping track of the grouping of active chromo- 
somes into individuals will be important, since by our diploid reproduction mechanism, chromosomal 
lineages can only coalesce when in distinct individuals (see Example B below) . In analogy with our 
previous nomenclature for our ancestral process, an active individual will carry at least one (and at 
most two) active chromosome(s). Let b{m) denote the number of active individuals at generation 
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m where f3{m)/2 < b{m) < /3(m) for all m. The ordered list of active chromosomes and the number 
of active individuals (called a 'configuration') at time m > is denoted by 

^""'^{m) := |cW(m),...,C7('^('"))(m); 6(m)} . (2) 

An individual number i at generation m is denoted by Ii{m), for i £ [b{m)]. An active individual 
is single-marked, if carrying one active chromosome, and is double-marked, if carrying two active 
chromosomes. Specifying the arrangement of chromosomes in individuals completes our description 
of the (prelimiting) ancestral process. However, since all active individuals are single-marked in 
the limiting process, our description of the arrangement of chromosomes in individuals is given in 
Section 1.1.1 in the Appendix. That is, each configuration ^"' (ttt.) begins with the 2(/3(m) — b{m)) 
ordered consecutive chromosomes of the (3(m) — b{m) double marked individuals, followed by the 
2b{m) — f3{m) chromosomes contained in single-marked individuals. With this convention, the set of 
single- and double marked individuals and the grouping of chromosomes into individuals at genera- 
tion m is uniquely determined by a configuration ,^"'^(?ti) of form (l2|. For notational convenience, 
the time index m will be omitted if there is no ambiguity. 

For a given sample size n, the set of all possible ancestral configurations ^^'^ will be denoted 
by £/n- The subset i<"^ C i< of all configurations ^"'^ = {C(i), . . . , C(/3); 6} with b = /3, i.e. 
configurations consisting only of single-marked individuals, will play an important role later on. 
Indeed, all configurations in the limiting model will be confined to the set =2^™, and the pairing of 
chromosomes in individuals will become irrelevant. 

The mapping cd ('complete dispersion') 

cd : =< ^ <"" 

breaks up the pairing of chromosomes into diploid double-marked individuals. More precisely, we 
define 

cd({c«,...,C(«;6}):={c«,...,C7('^);/3}. (3) 

Configurations in =2^™ describe configurations in which all active individuals are single marked, i.e. 
carry only one active chromosome. 
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The effects of recombination and coalescence on the ancestral configurations in the case of two 
typical situations will now be illustrated. Example A will illustrate recombination, and Example B 
will illustrate coalescence of two chromosomes. 

Example A. Suppose the most recent previous event in the history of a given configuration 
^"'' (m) was a small reproduction event (at time tti + 1), and suppose that the resulting offspring 
individual is currently part of our configuration at time m, but neither of its parents is, and that 
the offspring individual is single-marked, i.e. carries one active chromosome. We obtain ,^"'^(?7i-|- 1) 
as follows: 

• If there is no recombination during the reproduction event, then the configuration in the 
previous generation remains unchanged, i.e. ^"' (m + 1) = ^"'' (m). 

• If there is recombination, say at a crossover point X G {1, . . . ,L — 1}, suppose the (single) 
offspring chromosome is 

Necessarily, the two parental chromosomes will be part of the configuration ^"'^(m + 1), 
residing in the same double-marked individual. More precisely, the two parental chromosomes, 

say C^^'{m + 1) and C*^-^^^' (m -|- 1), are determined by (for i G [L]) 



(J) 



(m + 1) = < 



'~p{m): l<i<X, 



and 



Ll^'+^Vm+l) 



X + l<(!.<L, 



1< ^ < X, 



l}p{m): X + l<e<L. 



in which denotes loci not carrying any ancestral segments. The offspring chromosome is of 
course not part of ,^"'^(?7i-|-l). This transition can be partially trivial (a 'silent recombination' 
event), if the crossover point is not in an 'active' area, i.e. if L^ =0forX-|-l<^<L (or 
for all 1 < ^ < X). By way of example, with L = 3, if chromosome C^^' = { {j} , {j} , {j} } 
was a recombinant, and the crossover point occurred between loci 2 and 3, the two parental 
chromosomes are given by C"' = { {j} , {j} , 0} and C^^'^^' = {0, 0, {j} }. 
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Example B. Suppose the most recent previous event in the history of a given configuration 
^"' (m) of chromosomes at generation m is a small reproduction event at time ttt, + 1, leading to 
a coalescence of lineages. This is the case e.g. if both a single-marked offspring individual with 
active chromosome 0{m) is in our configuration ^"'^(m), as well as its single marked parent (say 
with currently active chromosome C^{m)), from which it actually obtained its active chromosome. 
Then, to obtain the configuration ^^' [m + 1), the offspring chromosome C^^'{m) is deleted, and 
the resulting ancestral chromosome C^-^>{m + 1) is given by the family of the union of the sets l}-^ 
andL^, 

C^^\m + 1) = hJf\m) U l}p{m), ..., 

h^i\m)Uh'-l\m)\. (4) 

All other chromosomes in ^"'^(m+l) are copied from ^"'^(m). Again, taking L = 3, if chromosomes 
^ — { {J} 1 {j} 5 {J} } ^^d ^ — { {^} ' {^} 1 {^} } coalesce, the resulting ancestral chromosome 
is given by C^^^ = { {j, k} , {j, k} , {j, k} }. 

Scaling and classification of transitions 

In order to obtain a non-trivial scaling limit for {^"' (?7i)} as A^ — )• oo, the limit theorem of 
( MOHLE and Sagitov]|2001[ ) (cf also the special case considered in ( [Eldon and Wakeley[|2006| )) 



suggests one should, for some constant c > 0, choose probability 1 — c/N'^ for the small reproduction 
events, c/N'^ for the large reproduction events, i.e., setting 

SN = c/N^, (5) 

and speed up time by A^^. For the recombination rate to be non-trivial in the limit (i.e. neither 
nor infinitely large), we require that all recombination values ry scale in units of A^, i.e. for each 
crossover point i £ [L]\ {L}, 

rW:=^, 0<rW<oo. (6) 

Thus, even though our timescale is in units of A^'^ timesteps, recombination is scaled in units of A^ 
timesteps. On the level of single lineages the probability of recombination is of the order O (A^^^). 
Indeed, after a small reproduction event, the probability of drawing an offspring is 1/A^. The 
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probability that the offspring carries a recombined chromosome is of order O {\/N). 

Given the cornucopia of possible transitions from ^"' (?n,) to $J^' {m + 1), it will be important 
to identify those transitions which are expected to be visible in the limiting process. 

All possible transitions fall into the following three regimes: 

• Those transitions which happen at probability of order 0{N^^) per generation, which will be 
visible in the limit (since time will be scaled by A^^). They will be called effective transitions 
and will appear at a finite positive rate in the limit. 

• Further, there are transitions which happen less frequently, typically with probability of order 
0{N^^) or smaller per generation, which will thus become negligible as A^ — )■ cx) and hence 
be invisible in the limit. These will be called negligible transitions. 

• Finally, there are transitions which happen much more frequently (with probability of order 
0{N^'^) or even 0(1) per generation). At first sight, one might think that their presence 
might lead to chaotic behaviour in the limit. However, this will not be case. Instead, these 
transition will happen 'instantaneously' in the limit, and result in a projection of the states of 
our process from s/n into the subspace =2^", which will be the limiting statespace. This will 
be proved below. Such transitions will be called projective or instantaneous transitions. The 
identity transition is a special case of a projective transformation. 



In the Appendix (section 1.1), a full classification of all transitions into the above groups is 
provided. 

Instantaneous and effective transitions 

The most important transitions and their effect for the limiting process will now be described 
in detail. Consider the following most recent events in the history of a set of lineages, i.e. events 
occurring at time m + 1, from the perspective of the ancestral process ^"' (tti) at time m: 

• Event 1 (silent): A small reproduction event occurs, but the offspring is not active. This is 
the most likely event, and is of the order 0(1), but does not affect our ancestral configuration 
process ^"'^(ttt,), i.e. ^"'^(tti + 1) = ^"'^(m). This event leads to an identity transition (a 
trivial instantaneous transition). 
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• Event 2 (dispersion): A small reproduction event occurs, the offspring is active in our sam- 
ple but neither parent is, and recombination does not occur. This is a relatively frequent event 
which occurs with a probability of the order 0{N^^) per generation (since the probability that 
the offspring is in the sample is b{m)/N). If the offspring carries only one active chromosome, 
we again see an identity transition, i.e ^"' (?7i + 1) = ^"' (rn,). If the offspring carries two 
active chromosomes, i.e. is a double-marked individual, the two active chromosomes will dis- 
perse to two separate individuals, who will then become single-marked individuals. Formally, 
for ^ = {C^^' , . . . , C^^'; 6} G jz^i with at least one double-marked individual (5 < /3), define 
the map dispj(-) : £/„. — ^ s^n dispersing the chromosomes paired in individual i, 

disp,(0 ={C^'\ ..., c(2-2)^ ^(2i+l)^ ^(2^+2)^ 

_ ^ ^ ^ ^(2(/3~fe))^ ^(2i-l)^ ^(2i)^ ^(2(/3-fe)+l)^ 

...,C('^);6+l} (7) 

if 1 < i < f3 — b and dispj(^) := ^ otherwise. Recall that the i — th double-marked individual 
has chromosomes labelled 2i — 1 and 2i. For ^^' (m), if the i-th double marked individual is 
affected, we have the transition ^"'^(m -|- 1) = dispj(^"'^(m)). 

The dispersion events will happen instantaneously as A^ — )• oo (recall we are speeding time up 
by N'^), and thus will, in the limit, lead to an immediate complete dispersion of all chromo- 
somes paired in double-marked individuals. If in the course of events, a new double-marked 
individual emerges due to pairing of active chromosomes in the same diploid individual, a dis- 
persion of the chromosomes will occur immediately. Event 2 will hence result in a permanent 
instantaneous transition, mapping our current state ^ G ^n into the subspace =2^"" by means 
of the map cd defined in (pi). Our limiting process will thus live, with probability one for each 
given i > 0, in iS^™, even if we start with a configuration from £/„ \ ^^ at time t = 0. 

• Event 3 (recombination): A small reproduction event occurs, a single-marked offspring 
but neither parent is in our sample, and recombination affecting the active chromosome at 
a crossover point x. This event has probability of the order 0{N~'^) per generation, and 
will thus be visible with finite positive rate in the limit. It is an effective transition, which 
can be described formally as follows. Define the recombination operation recomb acting on 
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chromosome j and crossover point x for a configuration ^ G =2^"" as 



recomt 



bjAO--={c^'\---,c^'-'\c^'''\ 



wliere 



C'(^'2),c(^-+i),...,C(^);/3 + l}, (8) 



C(^'i) = {Lp\...,Lf)} 



with 



l1^'^) 



lJ^'^ : 1 < ^ < X - 1, 



x<e<L, 



and 



(7(^'2) = {£;■ 



{i.2) f(i,2)- 



' • • • ' ""L 



with 



L 



(i>2) 



(i) . 



1 < £<x-l, 



X <^< L 



(if one of C^^'^\ C^^'"^^ equals {0, . . . ,0}, we define recombj^Q,(.^) := ^, giving rise to a silent 
recombination event). 

Event 4 (pairwise coalescence): A small reproduction event occurs, one single-marked 
parent and a single-marked offspring are in the sample, the active chromosome is inherited 
from the parent in the sample, and recombination does not occur. This event occurs with 
probability of order 0{N^'^) and will therefore be visible in the limit with finite positive rate, 
hence gives rise to an effective transition. It will lead to a binary coalescence of lineages 
and can formally be described as follows. The ancestral chromosome C^^'^> formed by the 
coalescence of chromosomes ji and J2 is given by 



&n) = IhY^'uh 



(ii) I 1 1 (h) 



^1 ) • • • V 



in) 



uhV'"} 






(9) 



if 1 ^ ii < J2 ^ /?■ Define the binary coalescence operation pairmerge acting on chromosomes 
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j'l and j2 (1 < ii < J2) in a configuration ^ G =2^"" as 
pairmerge,.^,,.,(0:={c«,...,C'(^'^\..., 



Cin-i)^ Cfe+i), . . . ,C('^); /3 - 1} (10) 



if 1 < Ji < J2 < /3 (otherwise, we put pairmergej^j^lO •= 0- 

• Event 5 (multiple merger coalescence): A large reproduction event occurs, neither par- 
ent but (possibly several) single marked offspring are in our sample, and recombination does 
not occur. This is again an event with probability of order 0{N~'^) per generation and 
therefore will be visible in the limit with finite positive rate, hence gives rise to an effective 
transition. The offspring chromosomes will be assigned their parental chromosomes indepen- 
dently and uniformly at random, since due to an immediate 'complete dispersion' via Event 
2 each offspring individual will carry precisely one active chromosome. Now we formally 
define the multiple coalescence operation groupmerge for ^ G -^f™ and pairwise disjoint sub- 
sets Ji, J2,^3, ^4 C [/3] in which either at least one |Jj| > 3 or at least two of the |Jj| > 2. 
This transition is, thus, really different from a pairmerge transition. Let Jj denote the set of 
offspring chromosomes derived from parental chromosome j. Then 

groupmerge^^,j,,^3,^,(0 := {c^^\C^^\C^^\ 

&^\ C^^\j G [/3] \ (Ji U J2 U J3 U J4); /3} 



fill 



with ((x)"*" := max(x,0)) 






and the four parental chromosomes, at least one of which is involved in a merger, are given 
by (1 < i < 4), 

The chromosome(s) C^^^> appaering in groupmerge j^ j^ j J4{0 denote the chromosomes in ^ 
that are not involved in a merger. 
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All other events: Will either not affect our ancestral process, or have a probability of 
order smaller than N~'^ so that they will be absent in the limit after rescaling. A complete 



classification of these events will be given in the Appendix (section 1.1). 



The limiting dynamics and state space 

The expected dynamics of the limiting continuous time Markov chain {^(i), t > 0}, taking values 
in ^nj as A — >■ oo, will now briefly be discussed. 

• Complete dispersion (Event 2) of the sampled chromosomes is the first event to occur (be- 
tween times t = and t = 0^). By Ij we denote individual number i (see section 1.1.1 in 
Appendix). At time t = when ^(0) G jz^i we assume all n sampled chromosomes are paired 
in double- marked individuals (n even); 

e(0) = {][.: I. = {cr^),Cr}, 

l<i<n/2|. (12) 

Immediately (at time O"*"), the chromosomes disperse into single-marked individuals, 

e(o+) = cd(c(o)) 

= {li:I, = {ci'\0},l<i<n} (13) 

• Throughout the evolution of the process, whenever double marked individuals appear (e.g. 
from a coalescence of lineages event). Event 2 will immediately change our configuration to 
the corresponding 'all dispersed'-configuration, i.e., for each t > 0, 

C(t+) = cd(e(t)) G <r 

Such 'flickering' states will not affect any quantities of interest of our genealogy, so we can 
assume that they will be removed from the limit by choosing the cadlag modification of 
{Cit),t > 0}, taking only values in =2^™ for all t > (this modification does not affect the 
finite-dimensional distributions of {S,{t),t > 0}). 
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Recombination (Event 3) appears in the limiting process at total rate r = A^' + • • • +r^^~^', 
where a certain recombination involving a given crossover point £ appears with rate A ' on 
any lineage. Indeed, from our scaling considerations, we have for the probability of not seeing 
a recombination at -£ in a small resampling event for more than N'^t scaled time units for a 
given single-marked individual satisfies (rjy = r^'/N) 

as A^ — )• oo (recall (p]); the probability for any given individual to be the child in a small 
reproduction event is 1/N), hence the waiting time for this event to happen is exponential 
with rate A '. 

Coalescences appear according to the effective transitions described by Event 4 and Event 
5. From the point of view of a given pair of active chromosomes in different individuals, a 



1,2 



single pairwise coalescence will occur at rate 1 + c^C/3;2;/3-2 with C/3;2;/3-2 from (15) (with 
r = 1, s = /3 — 2), where the 1 comes from a pairwise coalescence according to a small 
reproduction event, and the c^C^;2;/3-2 from a large merger event (the rates can be easily 
derived from considerations similar to the recombination rate r above), recalling that both 
coalescing chromosomes have to 'successfully flip a ■0-coin' in order to take part in the large 
coalescence event, and then are uniformly distributed into four groups according to the choice 
of any of the four potential parental chromosomes. 

Given large coalescence events (involving at least three individuals, or at least two simulta- 
neous pairwise mergers) happen with overall rate c^ times the corresponding coalescence 
rate of a H-coalescent, obtained from the number of individuals taking part in the merger 
independently with probability ip. The participating individuals are then being distributed 
uniformly into four groups according to the chosen parental chromosome. The corresponding 



rate is given in the third line of (14) (cf also (15)). 



The limiting ancestral process 

According to the above consideration, it is now plausible to consider the following limiting 
Markov chain as the ancestral limiting process. This fact will be proved below, with most computa- 
tions provided in the Appendix. The m-th falling factorial is given by {a)m '■= a(a — 1) • • • (a — m + l), 
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(a)o := 1. The operations pairmerge, recomb and groupmerge for elements of s^^^ were defined above 
in the section on scaling. Now we define the generator of the continuous-time ancestral recombina- 
tion graph derived from our model. 

Definition 1.1 (Limiting multilocus diploid ancestral recombination graph). The continuous-time 
Markov chain {^(t),i > 0} with values in ■s^^"', initial condition ^(0) := cd(^) for S, G s^n o-i^d 
transition matrix G, with entries for elements S,' , £, € =2^"", C' ¥" ^! ^-^ given by (J := ( Ji, . . . , J4)), 

1 + c^C/3;2;/3-2 if^' = pairmerge^-^ ,,-2(^) 

^/f = recombj-£(0 (14) 

if^' = groupmerge J (^) 
for all other ^' / ^ 



V"^/3;|J| 



etc. 



(where in the penultimate line we only consider cases where either at least one \Ji\ > 3 or at least 
two of the |Ji| > 2), with 



C/3;|J| :- C'/J;|Ji|,|J2|,|J3|,|J4|;/3-(|Ji|+|J2| + |J3l + |J4|) 



and (s = b — ki 



/cj, > 0, X Ay := min(x, y)) 



'^b;ki,...,kr;s 



sA(4-r) 
4 Y^ I A iVr+l 



ih^ ^^ \lj 4'^'i 



-kr+l 



(l-V)^-'(V) 



'^-^(ilA'^i^ \-kr+l 



(15) 



For the diagonal elements, one has of course 






(16) 



The rates in ( 15 ) are the transition rates of the S-coalescent (a simultaneous multiple merger 
coalescent) with 

^ = ^(ip/4,i>/4,ip/4:,ip/4,0,0,...)^ 
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when r distinct groups of ancestral lineages merge. The number of lineages in each group is given 
by ki, . . . ,kr, given /3 active ancestral lineages. The number s = /3 — (ki + • • • + kr) > gives the 



number of lineages (ancestral chromosomes) unaffected by the merger (cf. SCHWEINSBERG (2000a), 
Thm. 2). The particular form of H given above follows from the fraction ip of the population replaced 
by the offspring of the two parents in a large reproduction event, and our assumption that each 
parent contributes exactly one chromosome to each offspring. We have the following convergence 
result. 

Theorem 1.2. Let {^"'^(?7i), ?n, > 0} be the ancestral process of a sample of n chromosomes in a 
population of size N and assume the scaling relations ji^lw- Then, starting from ^"' (0) € £/n, we 
have that 

in the sense of the finite- dimensional distributions on the interval (0, cxd). The initial value of the 
limiting process is given by 

m = cd(r''^(o)) G <". 

A proof can be found in the Appendix. If c = 0, the classical ancestral recombination graph for 



a diploid population with recombination in the spirit of GRIFFITHS and MARJORAM (1997) results. 
General diploid Moran-type models: "random" tp 

One of the aims of the present work is to understand the genome-wide correlations in gene 
genealogies induced by sweepstake-style reproduction. So far, we have discussed this for a very 



simple example of a sweepstake mechanism (analog to the one considered in Eldon and Wakeley 



(2006)). More precisely, the fraction ijj G (0,1) of the population replaced by the offspring of 
a single pair of individuals in a large offspring number event has hitherto been assumed to be 
(approximately) constant. Along the lines of the previous discussion, an ancestral recombination 
graph with a randomized offspring distribution can be derived (a comprehensive discussion of single- 
locus haploid Moran models in the domain of attraction of A-coalescents can be found in a recent 



article of HuiLLET and MOHLE (2011)). Even though ip is now considered a random variable, 
the population size stays constant at N diploid individuals. Allowing ip to be random may be 
biologically more realistic than taking -i/^ to be a constant. On the other hand, the problem of 
identifying suitable classes of probability distributions for ^, reflecting the specific biology of given 
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natural populations, is still open and an area of active research. 

To explain the convergence arguments when ijj is random, let the random variable ^^, taking 
values in [A^ — 2], denote the random number of diploid offspring contributed by the single repro- 
ducing pair of parents at each timestep; a new realisation of ^^ is drawn before each reproduction 
event. Again, we consider the effect of such a reproduction mechanism on coalescence events in a 
sample. The probability that two given chromosomes residing in two single-marked individuals in 
the sample coalesce in the previous timestep given the value of ^ j^ is 

P({pair coalescence}!^^ — ^) 

= i<^{fc=l}Ar(i^_l) (17) 

-r40{A;>l} yN(N-l) ^ N(N-l)) ' 

where the first and second terms on the right-hand side describe the case where one parent and 
one offspring are drawn, the third term covers the case where two offspring are drawn, and the 1/4 
accounts for the probability that the two chromosomes in question must descend from the same 
parental chromosome. Define 

c^ := 4P({pair coalescence}) (18) 

Af-2 



y^ P({pair coalescence}!^^ = /c)P(*^ = k) 



k=l 

= El%ffc52] (19) 

(the factor 4 facilitates comparison with the haploid case). The sequence of laws C{^^), N £ N, 
will be assumed to satisfy the following three conditions: 

c^ ^0 asN ^oo, (20) 



1/E[^JN] E[M/,(^,+3)] 



E[^JN] l/c, {N-l)E[^, 



as TV ^ oo, (21) 
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and there exists a probability measure F on [0, 1] such that 



1 



F{^^ > Nx) 



Af->oo 



^F{dy) 



(22) 



for all continuity points x G (0, 1] of F. 



Condition (20) is necessary for any limit process of the genealogies to be a continuous-time 



Markov chain, condition (21 ) ensures that a separation of time scales phenomenon occurs, and (22 ) 



fixes the limit dynamics of the large merging events (it is analogous to (Sagitov 1999 necessary 
condition (13)) in the haploid case). In the proof of convergence to a limit process we will recall 



equivalent conditions to (22) (see Appendix, section 1.4). Condition (20) implies (see Section 1.4 
in Appendix) 



E[1'^/iV]^0 asiV^oo, 



(23) 



i.e. the probability for a given individual to be an offspring in a given reproduction event becomes 



small. Hence, (23) and i\2lh together show that there will be two diverging time-scales: The "short" 



time-scale 1/E [^j^/N] on which chromosomes paired in double-marked individuals disperse into 
single-marked individuals and the "long" time-scale 1/c^ over which we observe non-trivial ancestral 
coalescences. 

In order to obtain a non-trivial genealogical limit process, we will then speed up time by a factor 



of 4/c^, i.e., 4/cjY reproduction events correspond to one coalescent time unit (see Thm. 1.3 below) 



This time rescaling is chosen in order for two chromosomes to coalesce at rate 1 in the limit. The 
required scaling relation for the recombination rates is now 



N 



"^^^ as N ^>- oo 



(24) 



4E[^^/iV] 
with rW G [0, oo) fixed for ^ = 1, . . . , L - 1 (where f{N) ~ g{N) means limAr^oo f{N)/g{N) = 1) 



An intuitive explanation for the requirement ( 24 ) is that since the probability for a given individual 



to be an offspring in a given reproduction event is E [^^^/N], after speeding up time by 4/0^^, on 
any lineage recombination events between locus i and i + 1 occur as a Poisson process with rate 
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A simple sufficient condition for (21 ) is tlie following: For any e > 



N¥ {^N > eN) -^0 as iV ^ oo. 



(25) 



Indeed, we have, by assuming N > eN, 



[eN\ 



N 



^ [*^] = E ^'IP ("^^ = ^) + E k''Fi^N = k) 

k=l fc=[eAfJ+l 

< Y^ keNF {-^N = k)+ ^ iv¥ (^^ ^ ^) 

fc=l fc=[eAfJ+l 

< sNE [^iv] + A^^P (^iv > eN) . 



Dividing by NE, [^n] gives 



E [^%] ^ ^ ^ iVP (^^ > eN) 



NE[N] 



E[^ 



A^ 



and, since E [^n] > Ij 



lim sup 



E[^ 

NE[N] 



< e + lim sup NF (*Af > eN) = e. 



Af-s>oo 



Thus, condition (21) is obtained since we can choose e to be as small as we like. 



The limiting genealogical process will then be a continuous-time Markov chain on =2^"" with 
generator matrix G whose off-diagonal elements are given by (for the values on the diagonal we 



again have (16)) 



G(e,o 



Cft2 if ^' = pa\rmergej^j^{0 

rW ifC' = recombj-^(0 

C/3;|j| if"?' = groupmergej^^j2_j3_j4(e) 

for all other ^' / C 



(26) 



where 



C/3;|J| :- C^;|Ji|,|J2|,|J3|,|J4|;/3-(|Ji| + |J2| + |J3| + |J4|)' 
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k = (ki, . . . , kr), \k\ = ki + ■ ■ ■ + kr, and 



sA(4-r) 

Cb;k;s = 4 2_^ 
1=0 



«\ (4).+z 



/ / 4\k\+l 



■ f xl'=l+'(l-x)"-'^F((ix) 
J[o,i] x^ 

= Fm)S^r=lM=2} 

sA(4-r) 

^^ Z^ I / I 41'=!+' 

xl^l+'(l-x)"-'^F(dx) 

(0,1] ^ 



(27) 



with F from (22). As in the case of constant ip, the third hne in (26) gives the transition rates for 



a given merger into r (< 4) groups of sizes ki, . . . ,kr when /3 active ancestral hneages are present, 
with s = 13 — \k\ > hneages unaffected by a given merger of the H-coalescent with 



[0,1] 



^(x/4,a:/4,x/4,a:/4,0,0,... ) F{dx) , 



(cf. SCHWEINSBERG (2000a), Thm. 2). By way of example, C2-2;0 = 1- Now we can state the conver- 



gence of our ancestral recombination graph process with random i/j. The analogue of Theorem 1.2 
is the following: 



Theorem 1.3. Let {^"' (?Ti),?n, > 0} be the ancestral process of a sample of n chromosomes in a 



population of size N with offspring laws £(^j^) which satisfy (20), (21) and (22), and assume the 
scaling relation (24) for the recombination rates. Then, starting from ^"' (0) G £/„, we have that 



{r'^(L4t/cJ)}^{e(t)}, as N^oo, 



in the sense of the finite- dimensional distributions on the interval (0, oo). The process {^(t)} is the 



Markov chain with generator matrix (26) and initial value ^(0) given by 



C(0)=cd(r''^(0))G 



The proof is given in Section 1.4 in Appendix 
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While Cjy > I/N"^ by definition, in principle any decay behaviour of c^^ that is consistent with 
liminf7v-s>oo -^^c^ > 1, and hence any therefrom derived scaling relation between coalescent time 
scale and model census population size, is possible via a suitable choice of the family £(^j^), A^ G N. 

For an extreme example, let '^ ^ = \_N"'\ for some 7 G (0, 1), then c^ ~ jV-2(i-7) ^^^j ^22) is 
satisfied with F = 5q. 



The relation with the "fixed ^" model is as follows: For Theorem 1.2 we used the simple mixture 



distribution for ^„: 



p(^^^ = LV'ivj) = i-p(M/^ = i) 



iV2 



(28) 



for ^^, in which ip S (0, 1) and c > are both constants. Our choice (28) of law for ^^ gives 



using (17) 



E 



*Ar(*Ar+3) 



N{N-l) 



+ 



iljN{i)N + 3) 



N{N -I) ^ N{N -I) 



1 



(4 + cV'^). 



Define l(n^^\{x) = 1 if x E (0,^), and l(o,^)(x) = otherwise. Our choice (28) further gives 



and therefore 



P(^^ > Nx) = l(o,v,)(x)P(^^ > Nx) 



^P(^,>LiVxJ)^l(o,^)(x)^^ 



(x,l] 



y-^F{dy) 



with 



^ - i+^'^o + if^*^^- 



Furthermore, E [l-^/iV] = 1/iV + 0{l/N'^), thus 



1 4 + c^2 



m[^^/N] N 4 
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and Theorem 1.2 follows from Theorem 1.3 (after rescaling time in the limit process {S,{t)} by a 



factor of (4 + c^^)/4). 



The constant Cb-k '■= Cb;ki,...,kr;s (27) depends on the probability measure F. The form of F 
will no doubt be different for different populations. We reiterate that resolving the mechanism of 
sweepstake-style reproduction will require detailed knowledge of the reproductive behaviour and 
the ecology of the organism in question, along with comparison of model predictions to multi-loci 
genetic data. A candidate for F may be the beta distribution with parameters ?? > and 7 > 0, in 



which case the constant Cb-k in (26) takes the form {\k\ := ki + ■ ■ ■ + k. 



l\\k\+i 

B{\k\+£ + ^-2,s + j-£) 
Wrf) ^ 



(29) 



B{-, •) being the Beta function. 
Different scaling regimes 

The mechanism of sweepstake-style reproduction may be different for different populations, and 
the frequency of large offspring number events may also be different. The particular timescale of 
the large reproduction events (we chose e^^ = c/N"^) results in a separation of timescales of the limit 



process. Resolving the separation of timescales problem results in the ARG with generator (14) 



Different scalings of e^ result in different limit processes. By way of example, if N'^e^ — t- 0, large 
offspring number events are negligible in a large population, and we obtain the ARG associated with 



the usual Wright-Fisher reproduction, which can be read off Equation (14) by taking c = 0. One 
other scaling regime may seem reasonable, namely taking large offspring number events to be more 
frequent than in Assumption ([5|, but not too frequent. In mathematical notation, N'^£^ — )• 00 and 
Ne^ — )• 0. The ancestral process in this regime is again characterised by instantaneous separation 
of marked chromosomes into single-marked individuals, followed by coalescence and recombination 
occurring on the slow timescale. The probability of recombination is proportional to Ne^ since the 
slow timescale must be in units proportional to 1/e^. Hence, small reproduction events become 
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negligible in the limit, and the generator of the limit process is given by 



Git a 



^C'/3;2;/3-2 iU' = Pairmergej-^j2(0 



f\^) hf 







if^' = recombj-„(^) 
iff = groupmergej(0 
for all other f / ^ 



(30) 



in which C;.;. is given by Equation (15). The requirement Ne^ — )• is needed to prevent unreason- 
ably high rate of recombination. 

Haploid analogs 

A haploid version of the above model, where only one parent contributes offspring at each 
timestep, is a specific example of a A-coalescent, where 

l^{dx) = b^idx)^ c4?b^{dx), V'G(0, 1), cG [0,oo), 



see e.g. Eldon and Wakeley (2006) and BiRKNER and Blath (2009). More precisely, as the 



population size N tends to infinity, assume probability 1 — cjN'^ for the small reproduction events, 
cjN"^ for the large reproduction events (i.e., choose e^ = c/N'^), and speed up generation time by 
N'^. Again, by randomising t/j and/or switching to different scaling regimes, it is possible to obtain 
any given A-coalescent as limiting genealogy. 

Two-sex extensions 

Recent studies of the spawning behaviour of Atlantic cod indicate that cod adopts a lekking 



behaviour, in which males compete for females, and females exercise mate choice (NORDEIDE and 



FOLSTAD 2000). Direct microsatellite DNA analysis indicates that although multiple paternity is 



sometimes detected, the reproductive success is highly skewed among the males, i.e. most of the 



successfully fertilized eggs can be attributed to a single male (HUTCHINGS et al. 1999). Our model 



thus seems a good approximation to the actual reproduction mechanism of cod. Modifications to 
allow two distinct genders, and multiple paternity, are in principle straightforward. 

More general recombination models 

Our model can easily be enriched to allow also more general recombination events involving more 
than one crossover point at a time. Furthermore, by letting the number L of loci tend to infinity, a 
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continuous model, where [0, 1] represents a whole chromosome (as in GRIFFITHS and MARJORAM 



(1997|), can be accomodated into our framework. 
Correlations in coalescence times 
The marginal process 

Every marginal process (marginal with respect to one fixed locus under consideration) of our an- 



cestral recombination graph is a H-coalescent (see SCHWEINSBERG (2000a) for notation and details) 

with 

^\ 

- = ^0 + (^^^(ttttnn V 
'i U'4'4'4'^'^'--7 

For r = 0, all marginals are identical (realization wise), in particular times to the most recent 
common ancestor for different loci have correlation 1. However, in contrast to the classical setting, 
for r — )■ oo one expects that the loci will not completely decorrelate, but instead keep positive 
correlations, as pointed out to us by J.E. Taylor (personal communication). In particular, one will 
not obtain the product distribution. This observation is a potential starting point for designing tests 
for the presence of large reproduction events, by comparing correlations for loci at large distance 
(hence with high recombination rate) under a Kingman- and a H-coalescent based ARG. 

Correlation in coalescence times at two loci 

Correlations in coalescence times between two loci have been considered in the context of quanti- 



fying association between loci (McVean 2002). Eldon and Wakeley (2008) consider correlations 



in coalescence times for a haploid population model admitting large offspring numbers, in which the 
ancestral process only admits asynchronous multiple mergers of ancestral lineages. To illustrate the 
effects of the reproduction parameters on the coalescence times, we also consider the probability that 
coalescence occurs at the same time at the two loci, as well as the expected time until coalescence. 
The calculations to obtain the correlations for a sample of size two at two loci (following the 



approach and notation of DURRETT (2002)) are shown in the Appendix, Section [l.5[ As we are 



now considering the gene genealogy of unlabelled lineages, let us briefly state the sample space. Let 
a and b denote the types at loci a and b, respectively. The three sample states before coalescence at 
either locus has occurred can be denoted as {ab){ab), {ab){a){b), and (a)(a)(&)(6). By (ab){ab) we 
denote the state of two chromosomes each carrying ancestral material at both loci. By (a6)(a)(6) 
we denote the state of one (ab) chromosome in addition to two chromosomes (a) and (6) carrying 
ancestral types at locus 1 and 2 only, resp. The notation (a) (a) (6) (6) denotes the state of four 
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chromosomes each carrying ancestral types at only one locus. Let 

h{i):=F{{Ta = n}\i), ie {0,1,2} 

denote the probability that coalescence at the two loci occurs at the same time, given that the 
process starts in state i, in which i refers to the number of double-marked chromosomes (2, 1, or 
0). As we are working with the limiting model, all marked individuals are effectively single-marked. 
Under the usual (Kingman coalescent-based) ARG, limr-5>oo h{i) = as one would expect. Our 
model yields 

,1^^0 = 32+4^' ^e{0'i'2}; (31) 

indicating that even unlinked loci remain correlated due to sweepstake-style reproduction. Figure [2] 
shows graphs of h{i) as a function of tp for different values of c and r. As expected, h{i) increases 
with ^, at a rate which increases with c. 

Under the usual ARG, the expected time Ej[Ts] until coalescence at either loci, starting from 
state i is given by Ej[Ts] = (1 -|- h{i))/2. The random variable Tg can be viewed as the minimum of 
the time until coalescence occurs at the two loci. As r — )• oo, the times Ti and T2 until coalescence at 
the two loci, resp., become independent and identically distributed exponentials (i.i.d.e.) with rate 
1, whose minimum has expected value 1/2. Under our model, the mean of T^ is not the minimum 
of two i.i.d.e. with rate 1 -|- c^^/4, another reflection of the correlation in gene genealogies induced 
by sweepstake-style reproduction. Indeed, our model gives 

l^E^Ts] = I [^^^^) , zG {0,1,2}. 

in which x = 1 — ip'^ /8. 

Under our model, Ej[Ts] decreases with ip, and the rate of decrease increases with c (Figure p|. 
The same pattern holds for the expected time Ej[T;] until coalescence has occurred at both loci 
(Figure El). As r — )• 00, IEi[T;] associated with the usual ARG approaches the expected value (3/2) 
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of the maximum of two i.i.d.e. with rate 1. Under om' model, 



hm Ei[Ti] 



J- -r 4 J- -r 4 32 



cV^* 



+ 



c^^(6 — V"^ 



(cV;2 + 4)(4 + cV'2-c^V8) 



while the maximum of two i.i.d.e. with rate A has expected value 3/(2A). 

The correlation corj(Ti,r2) between Ti and T2 when starting from one of the three possible 
sample states i G {0,1,2} (see Appendix) increases with ■0, and more so if c is large (Figure p|. 
One obtains the following limit relations between h{i) and corj (Ti,T2) for i G {0, 1,2}: 



lim corj(Ti,T2) = lim h{i), (see Eq. (31)); 

r— >oo r— >oo ' ' 



lim cor j (Ti,T2) = lim h(i), (see Eq. (70)); 
r->0 r— >0 ' ' 



lim corj(Ti,r2) = lim h(i), (see Eq. (69)). 



Quantifying the association between alleles at different loci can give insight into the evolutionary 
history of populations. Let fa and /{, denote the frequencies of alleles a at locus 1, and b at 
locus 2, and let fab denote the frequency of chromosome ab in the total population. The statistic 
Dab '■= fab ~ fafb measures the deviation from independence, since if the two loci were evolving 
independently, fab = fafb- A related quantity is the r^ statistic, defined as 

D' 

fa{l-fa)fb{l-fb) 



(Hill and Robertson 1968), assuming /„, fb ^ {0, 1}. In applications, one would like to compare 



observed values of r^ calculated from data to the expected value E [r^] , obtained under an appro- 
priate population model. Calculating the expected value of r'^ is not straightforward, since r"^ is 
a ratio of correlated random variables. The expected value of r^ is, instead, approximated by the 



ratio D = ¥.[D'^\/¥.[fa{l - fa)fb{'^ - fb)] ( JOhta and Kimura| [l971 ). 

A prediction 2) of linkage disequilibrium in the population can be framed in terms of correlations 
in coalescence times between two loci for a sample of size two, assuming a small mutation rate 



(McVean 2002). The prediction rests on approximating the expected value E [r^] of the squared 
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correlation statistic r^ (HiLL and ROBERSON 1968) of association between alleles at two loci by 



the ratio of expected values (Ohta and KiMURA 1971). Following e.g. Durrett (2002) one can 



obtain expressions for correlations in coalescence times between two loci for a sample of size two 
(see Appendix). Under our model, one obtains the limit results 



Mm 2) = 0, 

r— >oo 



hm 2) 



When ip is small but c large, one obtains 



^3_i6t/.2+56V,-80 
V'3-10V'2+88V'-176' 



D 



5-7^/2 



ll-llV/2 ^'-^V'A' )■ 



Under the usual ARG, limr-i.o ^ = 5/11. Thus, even in the presence of a high recombination rate, if 
large offspring number events are frequent enough, one may only see evidence of low recombination 
rate in data. Further, the prediction 2!) can be substantially higher than Kingman-coalescent based 
predictions if c is large, and the recombination rate is not too small (Figure p|. 



For particular examples of probability measures F from Equation (27) associated with the 



generator derived from our random offspring distribution model one can compute the quantities 
considered above in relation to fixed ip. One such example distribution can be the Beta(i9, 7) 
distribution. One obtains for i € {0, 1, 2}, 



lim h{i) 



47(1 + 21? + 7) 



87(1 + 7) + IO7?? + 7t?(1 + ??) 



Define h{i) := liuir-^oo h{i). For i £ {0, 1, 2} one obtains 

47(1 + 2t9 + 7) 



lim Ei[Ts] =4h{i) + 



87(1 + 7) + IO719 + 7t9(1 + 1?) ' 
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3 1 ~ 

limEi[Ti] = -- -h{i) + 2(8^(1 + ^) + ^g^^ + 7^(1 + ^)) ■ 



(32) 



The form of the relation shown in (32 ) between h{i) and Ej[rs] and Ej[T;] resembles the one obtained 



for the Kingman coalescent-based ARG, with the addition of a 'correction' term due to simultaneous 
multiple mergers. 

Variance of pairwise differences 
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The expected variance of pairwise differences was employed by Wakeley (1997) to estimate 
the recombination rate in low offspring number (Wright-Fisher) populations, under the usual ances- 
tral recombination graph. Let the random variable Kij denote the number of differences between 
sequences i and j, with Ka = 0. The average number vr of pairwise differences for n sequences is 

n(n -l)^-^ ^ 
The (empirical) variance S'^ of pairwise differences is defined as 

In the Appendix we derive the expected variance of pairwise differences E [S"^] under the ancestral 



recombination graph described by the generator G (14) derived from our large offspring number 
model. Under our model, E [5^] is a function of the parameters c and V, hi addition to being a 
function of r and 6 (Figures 8 and 9|. In Figure 8 E [5'^], when only two loci are considered, 
is graphed as a function of the recombination rate, and in Figure |9] as a function of sample size. 
Figures 8 and 9 show that E [S^] is primarily influenced by the mutation rate {6), when the values 
of c and i/j are fairly modest. However, E [S"^] can be quite low when both c and tp are large, even 
when 6 is also large (Figure^. When c and ip are both large, two sequences are more likely to 
coalesce before a mutation separates them. 

The variance of pairwise differences alone will not suffice to yield estimates of r if both c and 
■0 are unknown. To jointly estimate the four parameters (c, ip, r, 0) of our model one probably 
needs to employ computationally-heavy likelihood and importance sampling methods in the spirit 



of Fearnhead and Donnelly (2001 ). However, given knowledge of c and ■0, one can, in principle, 
use the variance of pairwise differences to quickly obtain estimates of the recombination rate. 
Correlations in ratios of coalescence times 

The behaviour of the correlations in ratios of coalescence times for sample sizes larger than two 
is investigated using Monte Carlo simulations. 

Let Li denote the total length of branches ancestral to i sequences at one locus, let L denote 
the total length of the genealogy at the same locus, and define Ri := Li/L. Thus, Ri is the 
total length of external branches to the total size of the genealogy. The idea behind estimating 
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the expected value E [Ri] is as follows. Assuming the infinitely many sites mutation model, let Si 
denote the total number of mutations in i copies, S the total number of segregating sites, and define 
Vi := Si/S. The key idea behind deriving the coalescent was to separate the (neutral) mutation 
process from the genealogical process. The same principle also applies to predicting patterns of 
genetic variation using the coalescent: first one constructs the genealogy, and then superimposes 
mutations on the genealogy. The shape of the genealogy is thus a deciding factor in the genetic 
patterns one predicts. The relative lengths Ri of the different types of branches should therefore 



predict the relative number Vi of mutations of each class. This idea is exploited by Eldon (2011 1 to 



estimate coalescence parameters in the large offspring number models introduced by |SCHWEINSBERG 



(2003) and Eldon and Wakeley (2006). Namely, the claim is 



lim E [Ri] = lim E [Vi\ = f{w,i) 



(33) 



where n denotes the sample size, w denotes the coalescence (reproduction) parameters. Indeed, it 



follows from the results of Berestycki et al. (2007 20081, that (1 < a < 2) 



hm E [Ri] = lim E [¥.{] 



r(i+a-2)(a-l)(2-a) 
T{a)i\ 



when associated with the Beta(2 — a, a) coalescent derived by SCHWEINSBERG (2003) from a pop- 



ulation model in which the offspring law is stable with index a. A key feature of expression (33) 



is the absence of mutation rate in the function f{w,i)] thus given large number of DNA sequences 
(possibly in the thousands), one hopes to be able to obtain estimates of the coalescence parameters 
w without having to jointly estimate the mutation rate. In our model, there are four parameters to 
estimate, namely mutation and recombination rates, along with the coalescence parameters c and 



■0. Even though full hkehhood methods exist (BiRKNER and Blath 2008, BiRKNER et al. 2011) 



applying them to large datasets consisting of thousands of sequences may represent a challenge. 

Estimates of E [i?j] as functions of the sample size n, and the coalescence parameters c and %p 
are shown in Table |4J In nearly all cases the estimates Ri decreased as sample size increased; the 
exception was Ri when (c, "0) = (1000,0.5) (Table k|. When both c and ip are large enough, we 
observe a non-monotonic behaviour in Ri as sample size increases (results not shown). The non- 
monotonic behaviour may be related to the property of the marginal haploid process (the point- 
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mass part obtained as c — )• oo) of a single locus of not coming down from infinity ( SCHWEINSBERG 



2000b I, i.e. when one starts with an infinite number of lineages (sample size), the number of lineages 



stays infinite. For such processes that don't come down from infinity, the ratio Ri should go to 



one, i.e. the gene genealogy should become completely star-shaped (see e.g. Eldon (2011)). As 
both c and ip increase, one expects the deviation from Kingman-coalescent based predictions to 
increase. By way of example, for sample size 50 the vector (E [Ri] , . . . , E [Ra]) is estimated to be 
approx. (0.24,0.12,0.08,0.06) when associated with the Kingman coalescent (c = 0), while being 
approx. (0.58, 0.20, 0.09, 0.05) when (c, ip) = (1000, 0.5). In all cases the estimate Ri of the standard 
deviation of Ri decreases as sample size increases, indicating convergence. 

The rationale behind comparing the statistics in Tables ^\pu is as follows. As sequencing 
technologies advance, and the genomic sequences of more organisms become available, a case in 



point being the recently published genomic sequence of Atlantic cod (Star et al. 2011), genomic 
scans of thousands of individuals will become more common. Given DNA sequence data for many 
loci, one could calculate correlations for counts and ratios of counts of mutations, and compare 
them to predictions based on different ancestral recombination graphs. Similarly for the single- 
locus statistics (Table |4]), the idea is that the correlations of the coalescence time statistics (Lj and 
Ri) should reflect correlations of mutation counts (Si). In particular, under the usual ARG one 
expects (see Tables JSJJG]) 

lim cor (lI^\l^^A = lim cor (Rl^\RfA = 0, 

where the superscript refers to locus number one and two, respectively, while under an ARG ad- 
mitting simultaneous multiple mergers one expects 

lim cor (Ll\Lf^] = f{i,j,w) 
lim COT (R^\Rf A =g{i,j,w) 

where / and g are functions of the particular statistics indicated by i and j as well as the vector w 
of coalescence (reproduction) parameters. 

In general, the results reported in Tables |5}|6] indicate that high values of both ip and c are 
required for high correlations when recombination rate is high, when associated with our model. In 
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particular, the correlations between R^ and R^ (i.e. between corresponding i?j's at different loci) 
can be quite high, even when recombination is high, when both c and ip are large enough; another 
indicator of the genome-wide correlations induced by sweepstake-like reproduction. 

A different question concerns the limit behaviour as sample size n increases. Fix the recombi- 
nation rate and consider the limits 

lim cor (R\^\RfA , lim cor (v}^\v}'^^) (34) 



Under the usual ARG, one expects the limits in (34) to be only functions of the recombination rate 



(and i and j). If the ARG also admits simultaneous multiple mergers, one expects the limits in (34) 
also to be functions of zu. Considering unlinked loci, one would be interested in the limits 

lim lim cor fijf^iif^) , lim lim cor (y/^\ yf ^) (35) 



r— >oo n— >oo \ -^ / r— >oo n— >oo 



Resolving the limits (35) for different ARG's promises not only to yield insights into genome- wide 
correlations, but also to provide tools for inference; e.g. to distinguish between different population 
models. 

The C program written to perform the simulations was checked by comparing correlation in 
coalescence times for sample size two at two loci to analytical results. The program is available 
upon request. 



Comparison with Eldon and Wakeley (2008) 



Eldon and Wakeley (2008) consider correlations in coalescence times, and the prediction S of 
linkage disequilibrium, under a modified Wright-Fisher sweepstake-style reproduction model, and 
observe correlations in coalescence times between loci despite high recombination rate. Our work 
differs from theirs in important ways. To begin with, we treat diploidy in detail, in which each 
offspring receives its two chromosomes from two distinct diploid parents. This leads to a separation 
of timescales of the ancestral process. We formally derive an ancestral recombination graph which 
admits simultaneous multiple mergers of ancestral lineages, which naturally arise in diploid models. 
Eldon and Wakeley observed correlations in coalescence times when considering only sample size 
two at each locus in a model that contains diploid individuals only implicitly, it is not a priori 
obvious that the correlations would still hold for large sample sizes. We confirm this using our 
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formally obtained ARG, that allows us also to investigate correlations in coalescence times, and in 
ratios of coalescence times, for sample sizes larger than two at each locus. In addition, one can 
apply our ARG to inference problems. Indeed, we show how the variance of pairwise differences 
can, in principle, be used to obtain estimates of the recombination rate. Finally, we obtain a large 
class of ARGs by randomizing the offspring distribution; thus one is not restricted to the simple 
case of fixed ip. 

Furthermore, since the estimate 1) of the expected value of r^ can be expressed in terms of 
correlations in coalescence times, Eldon and Wakeley consider Tl under their modified Wright-Fisher 
model. However, X) is based on approximating an expected value of a ratio of correlated random 
variables by the ratio of expected values of the corresponding random variables, and is also derived 
for a sample of size two at two loci. Thus, 3 may not be the ideal quantity to quantify association 
between loci for large sample sizes. A more natural way may be to investigate correlations in 
coalescence times for samples larger than two the way we do. 

Discussion 

Understanding the genome-wide effects of sweepstake-like reproduction on gene genealogies was 
our main aim. To this end, we derived ancestral recombination graphs for many loci arising from 
population models admitting large offspring numbers. High variance in individual reproductive 
success, or sweepstake-style reproduction, has been suggested to explain the low genetic diversity 



observed in many marine populations ( Hedgecock ei aL 1982 Hedgecock 1994 AviSE ei a/. 



1988 Palumbi and Wilson 1990; Beckenbach, 1994 Arnason, 2004). Hedgecock and Pu 



DOVKIN 



(2011) review the sweepstake-style reproduction hypothesis, and conclude that it provides 



the correct framework in which to investigate many natural marine populations. 



Multiple (Donnelly and Kurtz 1999 Pitman 1999, Sagitov 1999) and simultaneous 



( SCHWEINSBERG 2000a MOHLE and Sagitov , 2001 ) multiple merger coalescent models arise from 



population models incorporating sweepstakes reproduction by admitting large offspring numbers 



(Sagitov 2003 Eldon and Wakeley, 2006, Sargsyan and Wakeley 2008). While multiple 



merger coalescent processes describing the ancestral relations of alleles at a single locus have re- 
ceived the most attention from mathematicians, ancestral processes for multiple linked loci have 
hitherto remained unexplored. We derive an ancestral recombination graph for many loci from a 
diploid biparental population model, in which one pair of diploid individuals (parents) contribute 
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offspring to tlie population at eacli timestep. Tlius, eacli offspring necessarily receives her chromo- 
somes from distinct individuals, as diploid individuals tend to do. Incorporating diploidy into our 
model the way we do leads to a separation of timescales problem. Our limiting object is essentially 
a 'haploid' process, in which chromosomes either coalesce or recombine. By extending a result of 



MOHLE (19981, we show that diploidy, a fundamental characteristic of many natural populations, 
can thus be treated as a 'black box', since the limiting object does not depend on the location of 
chromosomes in individuals. 

By adopting a Moran type model, in which only a single pair of individuals gives rise to offspring 
at each reproduction event, we chose mathematical tractability over more biologically realistic sce- 
narios; in which, for example, many individuals contribute offspring at each timestep. It should be 
straightforward to extend our model in many ways, for example allowing random number of parents, 
or introducing population structure. Indeed, we do extend our model in one way, by taking a ran- 
dom offspring distribution. These extensions still leave open the question of distinguishing among 
different large offspring number models. Our work on ancestral recombination graphs incorporating 
information from many loci is a step in this direction. 

Sweepstake-style reproduction induces correlation in coalescence times even between loci sepa- 
rated by high rate of recombination. The correlation follows from the multiple merger property of 
our ancestral recombination graph, since many chromosomes coalesce at the same time in a multiple 
merger event. The correlation remains a function of the coalescence parameters (c and ip) of our 
population model. An immediate question is the effects on predictions of linkage disequilibrium 



(LD). The approximation S) by McVean (2002) predicts low LD when recombination rate is high. 
However, when the rate of large reproduction events is high (c — )• oo), S remains a function of 
the coalescence parameters. The dependence of 3 on coalescence parameters has implications for 
the use of LD in inference for populations exhibiting sweepstake-style reproduction. Using simu- 



lations, Davies et al. (2007) found little effect of multiple mergers on the prediction r of linkage 
disequilibrium, when comparing the exact Wright-Fisher model with recombination to the usual 
(continuous-time) ARC However, by directly incorporating large offspring number events the way 
we do, we can show that large offspring number events do induce correlation in coalescence times, 
and hence influence predictions of linkage disequilibrium. 

The genome-wide correlation in coalescence times (Tables pHgI) induced by sweepstake-style 
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reproduction offers liints about liow to distinguisli between large offspring number and ordinary 
Wriglit-Fislier reproduction. We are unaware of any published multi-loci methods derived to distin- 
guish among different population models. Full likelihood methods may be preferable to the simple 
moment-based methods we consider. However, likelihood-based inference tends to be computation- 
ally intensive, and more so for large samples. For large samples, one should be able to quickly obtain 
a good idea of the underlying processes by comparing correlations in ratios of mutation counts with 
predictions based on different population models. 

In conclusion, ancestral recombination graphs admitting simultaneous multiple mergers of ances- 
tral lineages are derived from a diploid population model of sweepstake-style reproduction, suggested 
to be common in many diverse marine populations. Our calculations show that sweepstake-style 
reproduction results in genome-wide correlation of gene genealogies, even for large sample sizes. 
Estimates of linkage disequilibrium and of recombination rates are confounded by the coalescence 
parameters of our population model. The genome-wide correlation in gene genealogies induced 
by sweepstake-style reproduction implies that examining correlations between loci should provide 
means of distinguishing between ordinary Wright-Fisher and sweepstake-style reproduction. 
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1 Appendix 

1.1 Overview of transitions and their probabilities in the finite population model 

1.1.1 Basic setup and notation 

We will now classify all transitions and their probabilities of our population model relevant for 
the ancestral process under the scaling e^ = c/A^^, in which N denotes the population size. Fix 
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a sample size n for this section. Usually we suppress the dependence on the sample size in the 
notation below. Recall the state space £/n of our ancestral process (resp. jz^"" for the 'effective' 
limiting model). 

Let IIjv be the transition matrix of the Markov chain {C"''^(w.)}m=o,i,... on £/n describing the 
ancestral states of an n-sample in a population of size N. Our aim is to decompose Hat into 



1 



^N=A^ + ^B^ + R^ (36) 



where the matrix A^ contains all transitions whose probability is 0(1) or 0{N^^) per generation, so 
that they will happen 'instantaneously' in the limit, and either are identity transitions, or projections 
from £/n to £/^^ by means of dispersing chromosomes paired in double-marked individuals. The 
matrix B^ contains all transition probabilities which are positive and finite after multiplication 
with N"^ and N — )• oo, that is, our 'effective transitions'. The remainder matrix i?^ carries only 
transition probabilities that are of order 0{N^^) or smaller, that will thus vanish after scaling. 



Once we have established this decomposition, we can apply Lemma 1.7 below in a suitable way 
in order to identify the limit given in Definition 1 1 . 1 1 and establish the convergence result, i.e. Theo- 
rem [L2l 

In Tables 1 - 3 we will schematically deal with all possible transitions that can happen to a 
current sample over one timestep. 



Analogous to the notation and convention of MOHLE and Sagitov (2003), we assume that in 



every configuration ^"'^(m) from ([2|, the order of chromosomes in individuals Ij for i G [^(?ti)] we 
have 



Ii(m) = {C7(2*-I)(rn),c(20(^)| 

if 1 <i < /3(m) -him); 

(37) 

Ii(m) = |c7('5('^)"^(™)+*)(m),0} 



if /3(r?i) — b{m) + 1 < i < b{m). 
For ease of presentation, we denote by 
I' a single-marked individual carrying one active chromosome; 
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I" a double-marked individual carrying two active chromosomes; 

I' a single- marked individual (parent) whose marked chromosome is not passed on in the sample 
during a given reproduction event; 

I" a double-marked individual (parent) where one marked chromosome is passed on and the other 
not during a given reproduction event. 

The symbols (A), (B) and (R) in the tables denote whether the corresponding transitions belong 



to A^ (A), to iJjv (B) or the 'remainder term' (R) in (36) according to the decomposition mentioned 
above. After that, we compute all the important probabilities explicitly. The order of the probability 
of each transition is also noted in Tables 1-3. 

1.1.2 Transition type 1: Small or large reproduction event, no offspring in the sample 

If a reproduction event takes place, say at generation m, that does not affect our sample, this will 
not affect the state of our ancestral process at m -\-l, and we have ^"' (m) = ^"'' (m -|- 1). Hence, 
we see an identity transformation. We now compute the probability that our sample is not affected. 
Given current state ^ G £/n with b individuals and /3 chromosomes (hence P — b double-marked and 
26-/3 single-marked individuals), the probability that no child is in the sample is 

/ N-b\ 



N -b \ltpN\) 

K[ipN\) 



(1 - e^)^^ + ^.ff4 = 1 - OiN-'). 



1.1.3 Transition type 2: Small reproduction event, offspring in sample, at most one 
parent in the sample, no recombination 

Here, we only need to distinguish whether the offspring is single or double marked, and whether there 
is a parent in the sample. For example, it is immediate to see that the probability of a transition from 
a double-marked (I") offspring to two single-marked ({F,F}) individuals is of order 0{N~^) when 
no parent is in the sample and no recombination happens. Table [T] lists all corresponding events. By 
way of example, the state labelled {I',I'} denotes that two single-marked individuals, each carrying 
one active chromosome, is reached from the sample configuration. One such configuration is if 
the sample contains one offspring, but neither parent (0), and the offspring is carrying two active 
chromosomes (I"). 
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Table 1: Transitions of type 2. 



Offspring 


Parent with marked chroniosonie(s) 
(0 means no parent in sample) 

I' 


r 


{F,I'} {A) 

(*) o(iv-i) 


{F,F},{r,F} 
0(7V-2), (fi) 


r 


{!'} {A) 

(**) 0{N-^) 


{F}, {F'}, {F,F}, (i?) 
(t) 0{N-^) 




i' 


I" 


r 


o(iv-^), (i?) 


(t) 0(iV-^), (S) 


r 


{!"}, {1',!'}, (i?) 

0{N-^) 


{F,F'}, {F}, (i?) 



1.1.4 Transition type 3: Small reproduction event, offspring in sample, both parents 
in the sample 

If both parents and offspring are in the sample in a small event, this immediately gives a transition 
probability of order 0{N~'^) or smaller (depending on the presence of recombination, hence will 
be irrelevant, and be part of R^. We omit a detailed table listing the different single- and double 
marked individuals. 

1.1.5 Transition type 4: Small reproduction event, offspring and at most one parent 
in sample, recombination occurs 

Table [2] lists transitions due to recombination, and when neither parent is in the sample. The prob- 
ability of the presence of both an offspring and at least one parent in a sample, when recombination 
occurs, is of order 0{N~^), and so will vanish in the limit. 

Table 2: Transitions of type 4, neither parent in sample 



Offspring 


Parent 




F' 


{r,F}, o(iv-2), (5) 

{F',F'}, 0(iV-3), (i?) 


F 


F' , 0{N-''), (B) 
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1.1.6 Transition type 5: Large reproduction event, offspring in sample, no parent in 
sample, no recombination 

Table [3] lists all possible transitions when a large reproduction event occurs, no parent is in the 
sample, and recombination does not occur. The probabilities of the events listed in Table 4 are of 
order 0{N~'^), and so will appear as effective transitions in the limit. 

Table 3: Transitions of type 5. 



Offspring 


Parent 




ki F, k2 I" 


{I", I"}, 0{N-''), (B) 

{r,F}, o(iv-2), (s) 
{F,r}, o(iv-2), (s) 

I", 0{N-^), (B) 
F, 0{N-^), (B) 



1.1.7 Transition type 6: Large reproduction event, offspring in sample, recombination 
occurs and / or at least one parent in sample 

The probability that a large reproduction event takes place, at least one child and at least one 
parent are in the sample is 0{N~^). In addition, the probability that a large reproduction event 
takes place, at least one child is in the sample and also a recombination event happens in the sample 
is 0{N^'^). Hence all such events are negligible. 

1.2 The convergence result 

1.2.1 The limit of the projection matrix Aj^ 

Some care is needed in order to make sure A^ converges in the right sense to the desired projection 
matrix. The only relevant transitions of order 0(1) or 0{N~^) are transitions of type 1 and 2. The 
only one which is not an identity transition is the first dispersion event of Table [T] For ^ G £/n with 
h < (3 (i.e. at least one marked individual is double-marked), that is 
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This event will become part of A^, and has probability 



/N-b-l\ 

A^itdlspdO) = (1 - e^)i-,^-^il - rj\ l<i</3-b (38) 






(this is the probability of the event (*) listed in row 1, column 1 of Table n] note that the event 
(**) listed in row 2, column 1 there leads to an identity transition). Otherwise, we have 

A,{C,0 = 1 - (1 - gTv) ^ ^ ' ^ 1 - r^)' 

[2) 

Of course, A^ has to leave elements of the subspace =e^"' invariant, hence we set, for ^ with b = /3, 

Ajta--=i{i=i'}. 

Proposition 1.4. With the above settings, A^^ is a stochastic matrix for each N and 

lim lim sup ||A'' - P|| = (39) 

for all C > large enough, where P is the canonical projection from s^n to s^^ , i.e. 

P{tC') = l{5'=cd(0}- 



Proof of Proposition I.4 The Markov chain with transition matrix Aj^ can only change state by 



dispersing the chromosomes paired in a double-marked individual. We see from (38) that 

K{n, r, c) 



A^(e,disp,(o)> 



N 



for some suitable constant K{n, r, c), uniformly in b and i < (3 — b and N (for all A^ large enough). 
Hence, starting from ^ with /3 — b double-marked individuals, the number of j4^-steps required until 
complete dispersion has occurred is dominated by the sum of /3 — 6 independent geometric random 
variables 7} -|- • • • -|- 7a„{,, with success probability K{n, r, c)/N . By Markov's inequality. 



N£N 



48 



The proof can now be completed with a couphng argument, noting that two Markov chains run 
according to A^r resp. P, started in ^ G £/n get both stuck in cd(^), and this happens after at most 
CN steps with high probabihty (for C large). 

n 

1.2.2 Proof of the convergence result 

With the definition of ^4^^ from the previous section, put 

B*^:=N^{Un-A^), (40) 



and let P be the canonical projection from jz^i to =2^"" defined in Proposition 1.4 The following 
Lemma will identify G as the limit containing all the 'effective' transitions of i?^ when projecting 
on the subspace ^2^"". 



Lemma 1.5. We have 



Bn := PB*^P ^G as N ^oo (41) 



with G from (14)- 



Remark 1.6. We do believe that in fact the sequence of (formally larger) matrices B*^ on s^n 



converges as well, hut the statement about B^ is sufficient for our purposes below (see (48) in 



Lemma 1.1) and simpler to prove since it allows to restrict to the 'completely dispersed' configura- 
tions in ^C". 



Proof of Lemma 1.5 We inspect the types of events listed in Tables fl]l3l that are marked with (B). 
Events that are marked with (i?) have probability of order at most 0{N~^), hence their total 
contribution to any entry of Bj\f is at most 0{N^^) (since we are following a finite sample, there 
are only finitely many possible one-step events altogether). It suffices to consider Bj\[{S,,cd{ri)) for 
^ = {C(i), . . . , C(''); /3} G J^f"", r] £ £/n (because P projects to j<f"). 

Regarding ^' = pairmergej^ 12 (^)' This transition can happen in a small reproduction event 
(these events are listed at (f) in Table nlin row 2, column 2, note that events listed at (J) in Table [l] 
lead to a trivial transition once P is applied) or in a large reproduction event as in Table ^ if 
the grouping is suitable. Up to four parental chromosomes are involved in any reproduction event. 
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Hence, a large reproduction event can lead to a given pair merger in the sample if up to 5 individuals 
in the sample are children. Thus 

BNiCa =N\1 - eN){l - rM)2 x ll-^^^il 

\ 2 J 

(For the first term on the right note that either ji or j2 can be the child, the two factors of ^ come from the 
requirement that the chromosome in the child we are following is the one from the parent in the sample and 
is also the one we are following in the parent. For the second term on the right note that once we decide 
on c children in the sample {y^Z^ choices because ji and J2 are already chosen), there are (4)c-i ways to 



assign them to the 4 parental chromosomes. For comparison with (15 1 and the first line in (14 1 observe 

( Wl -c) ^ {N-P)\ [N^\ \{N~ [JV^J ) ! {N^:Y{N{l-^,)f-^ ^ ^_, 

(l#^j) {VN^\-c)\{N-P-[N^\+c)mr NP VK V) ■ 

Regarding ^' = recombj_£(.^) (assuming that a is such that C"' can be non-trivially cut into 
two by a recombination event between loci d. — 1 and t): This transition can happen in a small 
reproduction event as listed at (**) in Table ^ or in another event that has probability 0{N~^). 
Hence 

-. /N-h\ (£) 

BN{i,0 = N\l - en) X -J,\j^'-JT + 0{N-') = rW + ©(iV"^). (43) 

\ 2 ) 

Regarding ^' = groupmergej^ j^ j j^iO- This can only occur through a large reproduction event 
as listed in Subsection ^ Write ki := |Jj|, we assume ki > ■ ■ ■ > ka > 2 for some a G [4], 
ka+i = • • • = A;4 = (if a = 1, fci > 3), s := /3 — (ki + • • • + ka) is the number of singletons 
(non-participating chromosomes) in the merger. Note that by the structure of the diploid model, 
with a groups merging there can be up to fci + • • • + /Ca + (4 — a)^ children in the sample (put 
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differently: up to (4 — a)"*" 'non merging cliildren'). Tlien 

(4-a) + 






c'=0 



/l^^^ (4)a+c'UJ 



+ 0(iV"'). 



It remains to clieck tliat tlie diagonal terms behave correctly, i.e. that as A^ — )• oo, 

BNiC,O^GiC,C) = - E ^(^'O- (44) 

Because IItv and A^ are both stochastic matrices (as is P), we have 

Bn{C,0 = - Y. Bn{C,0 (45) 

for each A^. By inspection and the discussion above, all terms in Hat with decay rate 1/A^ are 
accounted for in Aj^, and all non-diagonal terms in 11^? — Aj^ with decay rate l/N"^ appear after 
multiplication with N'^ in B^ with their correct limits, namely the corresponding terms in G, while 



terms with a faster decay rate disappear in the limit. Hence (45) implies (44). D 



1.3 Markov chains with two time-scales — a variation on a lemma of Mohle 

Conceptually, our convergence result rests on a separation of time-scales phenomenon. It can 



be established with the help of a variant of a well-know result, see Lemma 1 from MOHLE (1998). 
Let £^ be a finite set. We equip matrices A = {A{x,y))x,y£E on E with the matrix norm 
ll^ll := uiaXx^E TliveE l^(^' y)\- Note that then ||^-B|| < ||^|| ||-B|| and ||^|| = 1 if ^ is a stochastic 
matrix. 

Lemma 1.7. Assume that for A^ G N, ^4^^ is a stochastic matrix on E such that 



lim lim sup ||A'' - P|| = (46) 

C— >oo N^oo r>CN 
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for some matrix P. Then we have for any < c,K,t < oo 



lim sup ||(^^+ciV-2B)[*^'l-(P + ciV-2_g)[tAf2]|| ^Q^ 



N-^oo 



\B\\<K 



(47) 



Furthermore, if {B^)i\i^f^ is a sequence of matrices on E such that 



G := lim PB P exists, 



N^oa 



(48) 



then 



lim {A^ + ciV-2 B^f^^^ = Pe^*^ for all t > 0. 



N~^oo 



(49) 



Remark 1.8. Instead of time scales N and N'^ one can allow more generally any aj\f, bjy ^ oo with 
bjsf/ajsf —7- oo, with only notational modifications in the proof. 



Proof of Lemma | J. ?| We begin with (47). W.l.o.g. assume K = 1, otherwise replace B by B/K 
and c by cK. Fix c, t > and a matrix B with ||i?|| < 1, abbreviate m := [tA^^]. Let e > 0, choose 
Co < oo and A'^o G N such that 



\Al-P\\<e for N > No, r > CqN 



(50) 



(as guaranteed by (46|). Note that 



\\iA^ + cN-'^ B)"" - (P + cN-^ BY 



<\\A- 



c \' 



^ii + E(]^j 

fc=i 



E 






k-\-l 

^7 n (^^^ 

J=2 



fc+1 
i=2 



Mimicking the proof in MOHLE (19981, we split the second summand into (the ellipses refer to the 



term inside the large norm brackets on the right of the last line of the previous formula) 



S^ :-- 



k=l mi,...,m^_^3^>CoAf 



and So : = 



1^ [iv^j z^ 



fc=i 



mi,...,m,fe+ieNo 

niiH \-mf^^i=m—k 

3j:mj<CoN 
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As in|M6HLE|(|l998[), p. 509 we have Si < 2e*(t + l)e for all N large enough, our estimate for S2 



is a small variation of the corresponding estimate in MOHLE (1998): Note that each of the matrix 



norms appearing in the big sum in S2 is at most 2, hence 



s2<2y: 



k=l 



m 



# (mi,...,mfc+i) GNg : 



rni + • • • + ruk+i = m — k, 
3j : nij < CqN 



m , CoNA{m-k) 



k=l 
m 



mi=0 



m — mi — 1 
k-1 



< 



<C 



2E 
/I 



m— 1 



k=l ^ ^ k=0 



k 



N' 



(We use in the last estimate that for |a;| < 1, n e N, Y.'^^o il)^'' = (l+a^)" and Y.n=o ^(T)^'' = ^(l+a;)""^) 



The derivation of (49) from (47) is literally the same as in MOHLE (1998), p. 509-511 (read 
2 .l.„.„^ □ 



Cj^ = c/N^ there) 



1.4 The convergence result with general random ^^ 



In this section we briefly indicate how the proof of Theorem 1.2 can be modified to yield Theo- 



rem 



1.3 In each reproduction event, a random number ^ of individuals die and are replaced by 



the same number of offspring, and recall Assumptions (20), (22) and (24). By "short" time-scale we 
refer to the scaling a^v given by 



N 



E[^, 



and by "long" time-scale the scaling h^ given by 



hN 



N{N -I) 
E[^^(^^+3)]' 



Assumption (20) yields 6jv — ^ co as A^ — )• 00, and b^/aN — )• 00 by Assumption (21). To check 



(23), i.e. that indeed a^ — )■ cxd, observe that '^ j^/N is a positive random variable, bounded by 1. 
Condition (|20]) is equivalent to E (^^^/A^)^^ 



0, which implies \I'^/A^ — )■ in probability and 



E[^^/A^] ^0, hence (23). 
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For use below, we recall implications of (22) provided that (20) holds (cf Sagitov ( 1999| )): 



For allj > 3 : — E 



N 



N^oo 



ri-2 



[0,1] 



F{dx) 



(51) 



Indeed, integration by parts yields 



-E 



^ N 

N 



jx- 



j-h 



-N J (0,1] 



> x\ dx 



JX 



.i-i 



AT^oo 7(0,1] J{x,l] 

y'-^Fidy). 

(0,1] 



^ N 

N 
y^^F{dy) dx 



(0,1] \^(o,i] 



'^{x<y}Jx^ ^dx y '^F{dy) 



(52) 



Furthermore for the case j = 2 one obtains 



limsup — E 



^ 



iV 



E rM/2^1 



< 1 < oo. 



(53) 



Let ^j^ have the following reweighted distribution (relative to ^^y): 



["^M = k) 



k{k + 3) 



E[^^(^^+3)] 



i^^ = k), k = l,...,N-2, 



(54) 



then 



^ JV 

iV 



F as A^ — ^ oo. 



(55) 



Indeed, for any £ E N 



E 



* 



iV 

N 



N{N -I) 
E[M/^(M/^+3)] 



E 



1 



-E 






£+2' 



^ JV 

iV 



£+1 



+ 



'^iv + 3 

A^-1 

3 1 



N-l iN-l)c, 



■E 



^ 



JV 

N 



e+i 



N-^oo 



(0,1] 



y^F(dy) (56) 



by (52) and ([53]), so (55) follows because the moments characterise a probability law on [0, 1]. One 



can check (along the lines of Sagitov (1999)) that under Assumption (20), both (52) and (55) are 
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in fact equivalent to (22). 



rem 



The proof of Theorem |1.3| is now a relatively straightforward adaptation of the proof of Theo- 
1.2 discussed in Sections |1.1| and 1.2 above. Scaling by A^ is throughout replaced by scaling 



with UN = N/E [^^] and scaling by iV^ becomes scaling with bN = N{N - 1)/E [^^(^^^ + 3)]: 

(i) When currently following b > 1 individuals, the probability that none of them is an offspring 
in the previous reproduction event (and hence the sample configuration remains unchanged) 
is 



E 



(I) 



E 



n 

j=0 



N-b-j 
N- i 



E 



n 

3=0 



N-j' 



l-O b 



N 



l-0{a~^'). 



This is analogous to transitions discussed in Section 1.1.2 and happens "all the time" (leading 
to the projecting transitions part in the limit). 

(ii) When currently following b > 1 individuals, say the i-th of which is double-marked, the 
probability that the i-th individual is the only offspring in the sample, and the sample also 
does not contain a parent, is (we write {x)k = x(x — 1) ■ ■ ■ {x — k + 1) for the A;-th falling 
factorial) 



E 



^^(JV-^^-2),,i 



E 



"^ 



N 

N 



1 



\T/ \6-l 
^ JV 

N 



a-i(l + o(l)). 



The projection matrix A now becomes 



andA^(C,0 = l-(/3-6)E 
then 



1-^(JV-1'^-2),_i 



'1 -r. 



l<i<P-b (57) 



[1 — Tj^)'^] the analogue of Proposition 



1.4 



IS 



lim lim sup \\A^ — P||=0. 



(58) 



C->oo N^oo r>CaN 

(iii) From now on we can work on the "projected" space £^"'. The distinction between small 
and large reproduction events is irrelevant in the general case. Hence, it is more suitable to 
distinguish whether a parent and an offspring are in the sample or whether several offspring 



(but no parent) is in the sample. In analogy with (40) and (41), we split 11^^ into "fast" and 
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slow" parts and define 






bNillN-A^), Bn :-- 



PB*^P. 



(59) 



It then remains to check that 



Bn^G with G defined in (26) 



(60) 



whence Theorem 1.3 follows from Lemma 1.7 together with Remark 1.8 



We now verify (60): 



(iv) Recombination events: These give the correct hmit, see the discussion below (24). 



(v) "Large:" The probability that exactly k > 2 individuals among b (excluding the parents) is, 



using (54), 



E 






E[*^(*^ + 3)]E 



^^(^,+3)(iV), 



(61) 



thus 1/c^ times this probability is 



A^(7V-1)E 



i^^UN-2-^^),.k 



7V-s>oo 



*^(^^ + 3)(iV), 
/ y'~\l-yf-''F{dy) 

J{0,1] 



1 



{N - 2)6_2 



E 



{^^-2)k-2iN-2-^, 



)b-k 



(62) 



by (55). Furthermore, the probability that at least 2 offspring and at least one parent are in 



the sample is at most 



b('-/m 



2(^, 



(N)- 



0{cjN) 



(63) 



hence such events become negligible in the limit. 



(vi) "Small" (=a merger of a single pair, which can result either from one offspring and one parent 
in the sample, or from two offspring but no parent in the sample): Here, the weight of F({0}) 
plays a role. 
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The probability that exactly two given single-marked individuals in a sample of size b are 
offspring (and none are parents) is 



E 



(^,)2(iV-2-^^),_2 



(64) 



and the probability that among a pair of two given single-marked individuals, one is a parent, 
the other an offspring and no other element of the sample is affected by the reproduction event 
is 



E 



2(2)i(M/^)i(JV-^^-2)fc_2 



(65) 



thus, 1/c^ times the probability that exactly one given pair (of single-marked individuals) is 
involved in a reproduction event is 



1 



-E 



M'^(^I'^ + 3)(iV-^^-2; 



6-2 



(Nh 



E 



(iV-2-^^)fe_2 



(iv-2; 



b-2 



N-^oo 



f {l-yf-^F{dy) = F{{f)])+ f {l-yf-^F{dy) 

^[0,1] ^(0,1] 



(66) 



by (55) 



(vii) (Combinatorial connections between participation in reproduction events and merging of an- 



cestral chromosomes) The rest of the argument in order to replace (15) by (27) is purely 



combinatorial; it is only concerned with possible groupings of the k single-marked offspring 
into up to four groups depending on which of the four parental chromosomes they descend 
from. 

In both cases considered in (pi) the probability that the chromosomes actually coalesce is 
I because they must descend from the same chromosome in the same parent, or from the 
particular chromosome in the particular parent we are following, respectively. 

1.5 Correlation in coalescence times 

In this section we outline the calculations to obtain the correlation in coalescence times Ti and 
T2 of types at two loci (1 and 2). As our sample consists of two unlabelled chromosomes typed at two 
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loci, we will sometimes find it convenient to denote an unlabelled chromosome carrying ancestral 
segments at both loci with the symbol I — I, while chromosomes carrying ancestral segments at only 
one locus with the symbols h and H. Loci at which types have coalesced will be denoted by •— , or 
•H. The states 6 of the unlabelled process for a sample of size two at two loci will also be numbered 
as follows: 



© in symbols 
2 {M){M) 

1 (H)(h)(H) 

-1 H)H) 

-2 (K)(K) 

in which states {0, 1, 2} denote the three possible sample states, before coalescence at either loci 
has occurred. States { — 1, —2} will be needed when deriving the variance of pairwise differences. 

Let h{i) := P ({Ti = T2}|i) denote the probability of the event Ti = T2, when B is in state i. 
Excluding large offspring numbers, one readily obtains {h{i) = for i 7^ {0, 1, 2}) 

^(2) ^ 2r2+13r+9 
^(^) = 2r2+13r+9 

For each i £ {0, 1, 2}, the expression for h{i) is the same as the one for the correlation between Ti 
and T2 when in state i, excluding large offspring numbers. The expected value w{i) = IEj[Ts] of the 
time Tg until a coalescence event at either locus starting from state i G {0, 1, 2} is, again excluding 
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large offspring numbers, 



w{2) 
w{l) 
w{0) 



obtained by solving the recursions 



r+9 



2(2r2+13r+9) 



2(2r2 + 13r+9) 



+ 



+ 



l{l + h{2)) 

Hi + h{i)). 



1 



2r2+13r+9 



+ 



Hi 



HO)), 



w{2) = (l + 2rw(l))/(l + 2r) 
w{2) = (l + 2rw(l))/(l+2r) 
w{l) = (1 + w{2) + ra(0))/(r + 3) 
w{Q) = {l + Aw{l))/Q 



Let v{i) := Ki[T^] denote the expected value of T^ when starting from state i G {0, 1, 2}. One can 



follow [D"urrett| (2002 1 to obtain the recursions 



v{i} 



^ + ^E!"-(^)+E^-(^) 



Qi 



(68) 



kf^i 



kjf^i 



in which q^ = X^^jj q^^. is the sum of the transition rates out of state i. To obtain (68) let J denote 
the exponential waiting time until the first transition, and Xj the state of the process immediately 
after the first transition. The random variables J and Xj are independent. One can write 

E [Tf I J, Xj] = E [(T, - J + J)(r, - J + J)| J, Xj] 

= E [(T, - J)2 + 2J(T, - J) + jV, Xj] 

= E [(T, - J) V, Xj] + 2JE [Ts - J\Xj] + E [j^] 



Taking expectations gives (68). 
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The variance Vj[7s] of Tg when starting in state i is given by 






+ 



1 1 



(2r+l) (r + 6) (2r2 + 13r + 9) ' 2 4 
r + 9 



l + h{2)f 



(r + 6) (2 r2 + 13 r + 9) 2 4 

r + 8 1 1 

+ 



+ \-hl + h{l)f 



(r + 6) (2 r2 + 13 r + 9) 2 4 
Hence, hm^-s^oo Vi[Ts] = 1/4 for i G {2, 1,0}, and 



(1 + Mo))' 



hmV2[r,] 

1 — 5>0 

hmVi[r,] 
limVo[T,] 

r— >0 



1 

2/9 
89/324 



Denote by T; the time until coalescence has occurred at both loci. The marginal coalescence times 
are exponential with rate 1, when excluding large offspring numbers. Solving the recursions 



E2[Tz] = (l + 2rEi[r/])/(l + 2r) 

EiiTz] = (1 + E2[r,] + rMTi] + 2)/(r + 3) 

Eom] = (l + 4Ei[T,] + 2)/6 



yields 



E[r/^)] 
E[r/^^] 
E[r/°)] 



2 2{2r2+13r+9) ~ 2 ("^ ^(^)) 
2 ~ 2(2r2+13r+9) = 2 (^ ~ ^(^)) 



3 

2 2r 



^^+137+9 = 2(3-/^(0)) 



Applying the recursions (68) yields the variances Vj[Ti]; 

+ 



V2[r;] 

Vi[r,] 



2^3 _^ nir2 ^ 171r _ 81 



(2r2 + 13r + 9) 

4r2 + 17r-f 5 

^^ 1 — 



2 4_ I ^ 

2 +4 



(2r2 + 13r + 
2r2 + 7r- 10 



+ 



(2r2 + 13r + 9)^ 4 
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with lim,._^oo Vi[r,] = 5/4 for i £ {0, 1, 2}, and 

limV2[Ti] = l, 
limVi[Ti] = 10/9, 
limVoK] = 365/324. 

r—>-0 

Now we admit large offspring numbers, take e^ = c/N"^, and r^^ = r/N. Ignoring the labelhng 
of the chromosomes, the Umit process has three 'effective' sample states, depending on the number 
of double-marked chromosomes (I — I). Denote the three sample states by ||i;|j , (I — I) fc' , and '|l' [l|' , in 
which h and H denote single-marked chromosomes. The states of the limit process are composed 
of single-marked individuals only, and are therefore the same as those of the haploid Wright-Fisher 
process. By •- denote a chromosome carrying a common ancestor at one locus, and (•-•) denotes 
the absorbing states. The transition rates are summarized in the following table: 

(M) V' V (H) {!-) (H) V"^/'(H) \^ ')\ 7 



2 



(-) 2r 1 + c% 

(H)(^) l + cf(l-|) r 2 + cf(l-|) 4 

(H) (H) ''32 ^-tcyip 2 8^ ^ ^ '' l^ 2 4 16 J <- A y^ A J f- 16 

i^)n 2+cf (i-f) i+cf 

{•^)i^) r 1 + cf 

By way of example, the rate of the transition from 1 to 2 by coalescence of the chromosomes h and 
H is 1 -|- cC3;2;i, the transition rate from to 1 is 4 (1 -|- cCa-2;2), and the transition rate from to 
the absorbing state ((•-•) or (•-)(—•)) is c(Ca;4;0 + Ca;2,2;o)- 

As before, let h(i) denote the probability the two loci coalesce at the same time. One obtains 
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limit results 



lim /i(2) = 1 



lim h{l) - 2 



lim /i(0) 



(69) 

5M^-272V+544 5 

(i/'~6)(3i/>2+16V-48) ~ 3 



The first equation in ( 69 ) tells us that the loci remain correlated due to multiple mergers even when 



they are far apart on a chromosome. When the recombination rate r is quite small, one obtains 
lim h(2) = 1 

r— >-0 

lim hfn) - 1 f 8c^2+32 , -80c -03+208 c -02+832 _ r 

imi/(,v"; 3 \^-c^3+6c»/)2+24 """ -3 cV*-16 0^)3+48 cV'2+192 ""^ 

Let Ej[Ts], as before, denote the time until coalescence at either loci, starting from state i. Admitting 
large offspring numbers, one obtains 

^lnnE.[r.] = 3,^j;,_^^, , ie {0,1,2}, 
limEi[r,]=0, iG{0,l,2}, 

4 



limE2[r,j - ^^^^^ 

c(l6V^-2?/'3)+64 
.^.^^j^l^^sj — _c^ ,/,5+6c2 V''-4ci/)3+48cV^+96 

l™ FnfT 1 - 16 4(t/.-8) 32 (39V.-32) 

imiJiLO[-tsJ 3{c(6i/'2-,/)3)+24) (3i/;+16) (ci/)2+4) 3 (c(3 V^+16 V3-48i/)2)_i92) (3^+16) 



lmiEl[rsJ — _^2,/,5^fi^2,/,4_4^,/,3_i_4«r.W,2_ 



Let Ej[r;], as before, denote the expected value of the time Ti until coalescence has occurred at both 
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loci, when starting from state i. Admitting large offspring numbers, one obtains the limits 



^hm E,[T^] = (,^2^4) (,,^4+^,^2+32) , ^e {0,1,2}, 
limE,[T;]=0, ie {0,1,2}, 



limE2[T;] 
limEiiT;] 

r— >0 

limEo[r/] 
1 — >o 



C»/)2+4 



c(32V^-6?/'3)+128 



-c2 i/)5+6 c2 i/)4_4 c i/)3+48 c V)2+96 

(28V'^-56i/)«-800i/)5+1600i/)"') c2 + (-608V'''-3200V'^ + 12800V'^) c+25600 



in which 



o3„;,9 o ^3„/,8 



o3„/,7 



,2„/,7 



^2,/,6 



^2„/,5 



o = 3 c^V - 2 c>« - 144 c'V' ' + 288 c-'V + 12 c^V- - 80 c^^'' - 1152 c^^j 
+ 3456 cV^ - 288 cV''^ - 2304 c^^ + 13324 cV'^ + 18432 

Considering the variance Vj[Ts] of the time Ts when starting from state i G {0, 1,2}, and admitting 
large offspring numbers, one obtains 



lim v,[r; 
lim V2[r; 



limV2[T; 
limVi[T; 
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(c (8^2-^4)^32)^ ' 
0, ZG{0,1,2}, 
16 



iG{0,l,2}, 



(l2»/'®-128V^+384i/)-*) c2+(3072 V2-512^^) c+6144 
(cV2+4)2 (-cV^+6ci/)2+24)2 



Correlations in coalescence times have been employed to quantify linkage disequilibrium (LD) 



(McVean 2002), in which LD is quantified as the square of the correlation coefficient of types at 



two loci (Hill and Roberson 1968). A description of how one can quantify linkage disequilib- 



rium as the square of the correlation coefficient of types at two loci can be found in Hartl and 



Clark (1989). Assuming a very small mutation rate, McVean (2002) related S to covariances 



in coalescence times. Writing Cov. (Ti,r2) as the covariance of Ti and T2 when starting from state 
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i£ {0,1,2}, |McVean| ( p002| ) obtained 

Gov, [Ti, Ts] - 2Cov, [Ti, Ta] + Cov„ [Ti, Ta] 



D 



1 



(E[ri])' + CovjTi,r2] 

E2 [T1T2] - 2Ei [TiTa] 



Eo [TiTs] 
in which Ti and T2 denote the times until coalescence at the two loci, respectively, and the covari- 



ances are conditional on the sample configurations, as indicated. Following e.g. DURRETT (2002 ) one 



can obtain the covariances under any population model. Under our population model, 2) = 2) 1/^2) 
in which 

Di = 6400-0^ - 224cV'^ + Slcip^ + 80cV^ - 56c^^^ + 16cV^ - c^^ 

+ r(16cV'*^ - 32cV'^ + GAcip'^ + 256) + 1280, 
2)2 = 14080^-^ - 352cV'^ + 8ci;^ + 512r^ + 176cV^ - 880^^^ + lOc^^ 

+ r(8cV'*^ - 2880^^^ + 832cV'^ + 3328) + 2816. 

One obtains the limit results 



lim 2) = 0, 

r— >oo 



cV' 



c^oo ^»-10i/)2+88i/'-176' 



hm S) 



1.6 Correlations in coalescence times for random ip 

In this section we consider the simple example of the probability measure F, evoked in relation 
to a random offspring distribution, taking the beta distribution with parameters "i? and 7. The 
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following transition rates for a sample of size two at two loci are obtained: 



(M) 



|(H) 
(H) 



(I-) (H) 
(I-) (H) 






(•H)(H) 



(— ) 



(l-l) 






H)[^] 




7+3i?/4 
i?+7 


(I-) (H) 


3 

8 


(l+i9)i9 


(I-) (H) 


(1+19+7) {i9+7) 









2r 



4{l+7)7+3'i?7+5(l+'?)'' 
{l+i?+7){'9+7) 





r,7+3i?/4 
^ i?+7 


■& 




4{i?+7) 


2(l+7)7+i'?7+f{l+'?)'? 


,?7+|(l+^^)^ 
(l+i9+7)(i3+7) 


(i?+l)i? 


( 1+19+7) (»9+7) 


4(i?+7+l)(i?+7) 



,7+3i?/4 
' )?+7 



(•H)(H) 



As before, the transition rates given above can be employed to derive correlations in coalescence 
times. Here we only consider the probability h{i). One obtains lim^_^o h{i) = lim^_^oo h{i) and the 



limit results are those obtained from the usual ARG (67). 



1.7 Variance of pairwise differences The variance of pairwise differences between DNA se- 
quences has been employed to estimate recombination rates in low offspring number populations 



( Wakeley 1997). Let the random variable K- . denote the number of differences between sequences 



i and j, with K-- = 0. The average number vr of pairwise differences for n sequences is 



vr 



1) ^ ''" 



n(n-l) ^-^'-'^ 



Under the infinitely many sites mutation model, E[7r] = ^E[T], in which T is the time until coales- 
cence of two sequences. Under our model, E[T] = 1/(1 -|- ajP' /A). Define the variance S"^ of pairwise 
differences as 



S^ 



n[n 



nE(^.--r 



i<3 



65 



To obtain an estimate of the recombination rate, one needs to compute the expected value E [S*^] 



L ^^ n n-l) "^ V 



K. -tt) 



E 



iK,,-7r) 



Thus, it suffices to consider E 



iK,,-7r) 



. Expanding, one obtains 



E 



(^12 - ^) 



E 



2 
n(n— 



T) E(^- - K. 



K] 



^EEJE[(^--^-.)(^--^^i) 



n?(n~l) 



i<] i<j 



Define the event A^ ' by 



A\' := {sequences i and j differ at locus i} . 



Assuming each sequence consists of L loci, and 1 {.) are indicator functions. 



^- - ^«. = E i^i!) - l^^f) 



£=1 



yielding, in case i = i' = 1, and J = i = 3, 



L L 

e=i i=i 



M^.^ " Miv ( ^A(t - ^Aff 



L L 



2 E E IP (aW n aW ] - p IM') n ^f^) 



12 13 
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In general, 



¥.[{K,,-K^^){K,,-K^.)] 



L L 

1=1 i^i 



^12 






12 17 



(71) 






12 ij 



12 I] I \ ij ij 



Now consider the probability P(^|2 ^ ^12 ) °^ ^^^ event that sequences 1 and 2 differ at both loci 
a. and L Admitting mutation introduces two new states, namely the states ||lj and Jl['. Define 

g{&) ■= P (both loci separated by mutation, starting from state &) 



Thus, P (aW n AfA = g{2), P (^W n A[^) = 5(1), and P (a^ n ^W] = ^(0), for £ / 



Now, 



ff(2) 
5(-l) 
5(-2) 

ff(0) 



9igi-l)+92gi-2) + 2rgil) 

ei + e2 + i + c'^ + 2r 



+ l + c^ 



^1 + 1 + C^ 

gig(-l) + 92g{-2) + rg(0) + (l + c^V4) ff(2) 
^i + ^2 + r + 3 + 3c^(l-f) + c'^ 



16 



M-l)+^25(-2) + c4^5(2)+ ch/^ 



sv* 



,2 V-^ ^" 



+ 4 5(1) 



' 32 



+ c^A^-^-^ +c^-4 



*1 
16 



^U6 + c^(l 



f)+C^+^l + ^2 
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In view of expression ( 71 ) , one obtains 



(a^o^ n A 



42 



A 



et 



^12 ' ' ^13 



^12 ' ' ^34 



29, 



+ 



01 + 1 + c^^/ A' 



+ 



A4: 



4:2 



g£/2 

20^ + A4 29, + A4 ' 29i + \i\^ + U 




(72) 



The event A\2 n A^l ( 72 ) occurs if the first two events in the history of the four sequences are 



mutations on appropriate ancestral hneages, or if hneages labelled 2 and 3 coalesce, followed by 
appropriately-placed mutations. 
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Table 4: Estimates Ri of the expected values E[Ri] of the ratios Ri := Lj/L for 1 < i < 4 at one 
marginal locus, along with estimates Ri of the standard deviations of Ri. Estimates are obtained 
from 10^ simulated gene genealogies. 

V' c n Ri R2 R3 Ra Ri R2 R3 Ra 



- 6 


0.466 


0.219 


0.138 


0.100 


0.183 


0.167 


0.198 


0.124 


10 


0.378 


0.180 


0.117 


0.085 


0.156 


0.132 


0.120 


0.110 


20 


0.300 


0.146 


0.096 


0.070 


0.119 


0.097 


0.088 


0.081 


50 


0.235 


0.116 


0.077 


0.057 


0.080 


0.063 


0.058 


0.055 


0.005 1 6 


0.466 


0.219 


0.138 


0.100 


0.183 


0.167 


0.198 


0.124 


10 


0.377 


0.181 


0.117 


0.085 


0.156 


0.133 


0.120 


0.111 


20 


0.299 


0.146 


0.095 


0.071 


0.118 


0.097 


0.088 


0.082 


50 


0.234 


0.116 


0.076 


0.057 


0.080 


0.064 


0.057 


0.054 


1000 6 


0.467 


0.219 


0.137 


0.100 


0.182 


0.167 


0.198 


0.124 


10 


0.377 


0.181 


0.117 


0.085 


0.156 


0.133 


0.120 


0.110 


20 


0.299 


0.146 


0.095 


0.071 


0.119 


0.097 


0.088 


0.082 


50 


0.235 


0.116 


0.077 


0.057 


0.080 


0.064 


0.058 


0.054 


0.5 1 6 


0.468 


0.217 


0.138 


0.099 


0.184 


0.166 


0.199 


0.124 


10 


0.381 


0.179 


0.115 


0.085 


0.157 


0.132 


0.120 


0.110 


20 


0.304 


0.145 


0.095 


0.070 


0.120 


0.097 


0.088 


0.081 


50 


0.242 


0.117 


0.077 


0.056 


0.081 


0.064 


0.058 


0.054 


1000 6 


0.541 


0.173 


0.116 


0.089 


0.184 


0.152 


0.177 


0.116 


10 


0.566 


0.117 


0.078 


0.058 


0.159 


0.101 


0.090 


0.082 


20 


0.743 


0.101 


0.035 


0.022 


0.084 


0.053 


0.033 


0.027 


50 


0.576 


0.195 


0.089 


0.046 


0.058 


0.051 


0.037 


0.026 



69 









T3 
03 



O r^ 



o 
O 



bjO 



m 1) 
■^ bjO 

o3 OJ 






><1 



o 
o 



(M 



>^ 03 



13 


ni 


n 




03 


O 




-t^ 




'11 


><1 


o 


s 


03 


^ 


rt 



0) 



(M 






><1 



o " 

.2 ^ 

t_l_, ■ ■ 

n M 



^^ . — I 
03 ,0 



o .i:^ 



;-( 


S 


rn 




w 


CN 




CA] 


lO 


^ 






CU 


o 


^ 


' 


o1 




H 


^ 



s 




a 








M 




O 




1 — 1 




rt 




o 




-o 




OJ 




m 




03 




rO 




OJ 








c3 




cc 




0) 




-t^ 




03 




a 








+j 




m 




H 




m 




CD 




o 




!=1 




CD 




S 




CT' 




OJ 




cc 




■c^ 




O 




+j 




, 1 




03 








-t^ 




m 




OJ 




o 




fl 




03 




m 




OJ 




,r1 




o 




rt 




03 




^ 


CO 


rO 


lO 


t4-H 

o 








f] 


'm 


s 


O 


CD 


r^ 








'ft 


"3 


a 


o 


03 


OJ 


03 


r^ 


Sh 


+J 


O 




^ 


cc 




OJ 


rt 


+^ 


o 


O 


03 


rt 


OJ 


CD 




-0 


cc 


^" 


03 




f-H 


r^ 


bjO 


o 






^ 


^ 


O 


^ 


^^ 


^S 


03 






^ 


^ 


^ 


a 


^;^ 


O 


•-^ 




!i 


^ 






'Is 


■^ 


fH 


c^ 






OJ 


-d 


CD 


rt 


S 


03 


03 



►^ 



^ 



^ 



^ 



►^ 



►-^l 



o 
o 



h 



h 



^ -^ 



cocococococococooqco 

CJOOCOOCOOCOCOCO 



^O^CO-^CO^COCTllO 
CJOOCOOCOOCJCOCJ 



T— (ClOOOClCO^Dt^OCO 

ooooc7^otJ^^OlOC^^ 
iooiooiooiOt— icnco 

cjoocoociocjcocj 



oo«Door^oocDOc:7iir~>;o 

CJOOCOOCOOCJCOCJ 



^O^CO^CO-^CJOICJI 
CJOOCOOCOOCJCOCO 



I— I^HOi-HOi— ((MOQOOOQ 
COOCOOCOOCOOOICTI 

CJOOCOOCOOCJCOCJ 



o o 
o o 



o 
o 
o 



o 



o 
o 
o 



^ 






^-q 



1^ 



^ 



^ 



^ 



^ 



-S)- 



ai 


CO 


Oi 


^ 


o 


lO 


CO 


lO 


CO 


CM 


o 


1 — 1 


o 


^H 


^H 


o 


OQ 


CN 


t^ 


CD 


o 


o 


o 


o 


o 


o 


o 


o 


^H 


CM 


o 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


00 




Oi 




00 




^ 








1 — 1 
CO 


o 


1 — 1 
CO 


o 


CO 


o 


o 

CO 


1 — 1 


oo 


CO 


CD 


o 


CO 


o 


CO 


o 


CO 


o 


CN 


CM 




CO 


1 


CO 


1 


CO 




CO 


CO 


CO 


lO 




1 — 1 




^ 












o 

CO 


OQ 


o 

CO 


C2 


o 
o 


^ 


^ 


CN 


00 


Oi 


1 — 1 


o 


1 — 1 


o 


CN 


CO 


00 


CO 


o 


CO 


o 


CO 


o 


o 


o 


CN 


CN 




CO 


1 


CO 


1 


CO 


CO 


CO 


CO 


CO 


1 — 1 


1 — 1 


1 — 1 




I— 1 




t^ 








o 


o 
o 


CM 

o 


00 

o 


o 


CO 

o 


o 
o 


o 

1 — 1 


UO 

o 


CD 
CD 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CM 






1 


CO 


1 


CO 




CO 


CO 


CO 


1 — 1 


CO 


lO 


CN 


Oi 


CO 


^ 








CO 


o 


CS| 


o 


Oi 


o 


1 — 1 


CO 

o 




00 


o 


o 


o 


o 


o 


o 


o 


CO 


CO 


CO 

1 


CO 

1 


CO 

1 


CO 


CO 


CO 
CO 


CO 
CO 


CO 


1 — 1 




lO 




^ 


(M 


c^ 








CO 

o 


o 


CO 

o 


o 
o 


CO 

o 


o 
o 


o 


o 


CO 


1 — I 
1 — 1 


CO 


o 


CO 


o 


CO 


CO 


CO 


o 


CO 


CO 


' 


CO 


1 


CO 


1 


' 


' 


CO 


CO 


CO 




o 




o 




o 




o 




o 


1 — I 


1 — I 


1 — 1 

o 
o 

CO 


1 — 1 


1 — 1 

o 
o 

CO 

o 
o 
o 


1 — I 


T-H 

CO 


1 — 1 


1 — 1 

o 
o 
o 


1 — I 


o 




1 — 1 




1 — 1 




1 — 1 




1 — 1 





70 



c;i 


o 












^ 














O 












-a 




'B 










03 


03 


a 










^ 


o3 


c/} 










1 — 1 


SB 

o 


O 

1 — 1 










o 














O 


^ 


fl 












0) 


O 










O 




-Ci 










tt; 


bjO 


rn 










o 


OJ 


& 










+j 




rO 










.^ 


0) 












"■+J 


bjO 


£ 










03 


CD 


03 










03 














C/] 


o 


03 












C^ 


a 










+J 












0) 


bjO 












a 














<u 


'cS 


i 










T— 1 


o 
cu 


o 
cu 


























. ^ 


•<!^ 






1 — 1 


CN 


'I 


o 


o 




o 
o 


CO 
CO 


o 

CO 


(M 


^ 










^ 


03 


C/3 










-0 


03 






?j" . 






s 

03 


O 


S 
03 












ifi 


M 










s 


0) 

o 
03 


o 
03 






CD 
CO 




a^ 


pj 


O 


o 


^ 


o 


1 


O 


rO 


LO 


o 


CO 


o 


+^ 


^4-1 


CU 








1) 


o 


o 










, ^ 


o 


^ 


^4-1 

O 


So. 








s 




Ccj 






^ 


o 


O 


a 

03 




00 


Oi 


33 


+j 


c/u 




^ 


00 


o 

o 


O 




03 


o 
o 


CO 


o 
o 


7^ 


+J 


1 










"-+J 


cc 










S 


fl 


cu 


f-j 








^O 


^ 


+J 


a 


, — ^ 










O 


c3 


(N 








h 


cu 


CU 


^" 






'O) 


OJ 


-0 












a 










f_| 


■(?i 


a 


,— I 






o 

o 


+j 


1^ 


03 


^" 






+^ 

o 

in 




o 


bjO 


O 

o 


o 

CO 


CD 

1 — 1 

T— ( 

o 


1 


^S 


03 






o 






B 


s- 


T— ( 


1 — 1 


03 


S 


^■~ 


a 








c/} 


^^ 


o 








-t^ 

m 


03 


^ 


o 








H 


of 

cc 


II 


'Is 


~S- 


1 




CD 


^ 


■^ 


iH 














cc 
O 








e3 




03 


03 


u 


o 





CO 


CO 

o 


CO 


1 — 1 

CO 

o 


t- 

^ 
^ 


00 
CO 

o 


CD 
CD 
00 


o 

CO 


o 


o 


o 


o 


o 


o 


o 


o 



t^^^CTl^COO-^ 
OOiOOluOOCDOOO 

^o^coiococTicq 
cooocoocooco 



lOt:y^lOC75lOOC^lCD 

cooocoocooco 



CDlOOuOCOuOiOOl 
CD:— It-^HOOCOiOt^ 
lO:— llOi— IIO:— ICJICD 

COCOCOCOCOCOCOCO 



o 

o 


o 
o 


lO 


lO 


o 


o 


o 


o 




o 
o 
o 




o 
o 
o 



D^ 



Ccj 



(^ 



tej 



c^ 



(^ 



Q^ 



tej 






tej 



Q^ 



c^ 



is- 



1 — 1 
CO 

o 
1 


o 
o 

CO 


^ — 1 
CO 
CO 

1 


1 — 1 

^ — 1 

o 

CO 


CO 
CO 

1 


1 — 1 

o 

CO 
CO 

1 


o 

CO 
CO 


CD 
1 — 1 

o 

CO 


CD 

CO 
CO 

1 


CD 
C7^ 
o 

CO 


O 


o 
o 


CD 

o 


1^ 

o 
o 


CO 

o 


00 

o 
o 


CO 

o 


o 
o 


00 
CO 

o 


o 


CO 


CO 


CO 


CO 

1 


CO 

1 


CO 


CO 


CO 
CO 


CO 

1 


CO 
CO 


CD 
O 
CO 


CO 

o 

CO 


CO 

o 

CO 


1 — 1 

o 

CO 


I— 1 

CO 

o 

CO 


CO 

o 

CO 


00 

o 

CO 


o 

CO 


o 
o 

CO 


C7^ 

o 

CO 




CO 




CO 


1 


CO 




CO 


1 


CO 


o 


o 
o 


o 


1 — 1 

o 


oo 

CO 

o 


CD 

1 — 1 

o 


CO 

o 


1 — 1 

o 




CO 

:— 1 


CO 


CO 


CO 


CO 

1 


CO 

1 


CO 


CO 


CO 

1 


CO 

1 


d 

1 


o 
o 


CO 

o 


00 
CO 

o 


00 

1 — 1 

o 


oo 

CO 

o 


o 


CO 

o 


CO 
CM 

o 


00 
CO 




CO 


CO 


CO 


CO 

1 


CO 

1 


CO 


CO 


CO 

1 


CO 

1 


CO 


CO 

o 


o 


o 


1^ 

o 


oo 
o 


o 

CO 

o 


CO 

o 


CM 

o 


(Ol 

CD 


o 

CO 
CO 


CO 


CO 


CO 


CO 

1 


CO 

1 


CO 


CO 


CO 

1 


CO 

1 


CO 


1 — 1 


o 

1 — 1 


1 — 1 

o 
o 

CO 


o 

1 — 1 


1 — 1 

o 
o 

CO 


o 

1 — 1 


T-H 

CO 


o 

1 — 1 

CO 


1 — 1 


o 

T-H 


o 




1 — 1 




o 
o 
o 

1 — 1 




1 — 1 


1 — 1 


o 
o 
o 

1 — 1 





71 



Figure 2: The probabilities h{2), h{l), and /i(0) as functions of ip (lines) for different values of r 
and c. Values of /i(-) obtained from the usual Moran model are shown for reference (symbols). 
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Figure 3: The expected time E[Ts ] as a function of ijj for different values of c and r. Values of 
IE[rs ] associated with the case c = are shown for reference (symbols). 
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Figure 4: The expected time E[T^ ] as a function of ijj for different values of c and r. For explanation 
of symbols, see Figure [3] 
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Figure 5: Correlation of the time to coalescence at two loci as a function of ■0) for different values 
of c and r. For explanation of symbols, see Figure [3j 
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Figure 6: The estimate 2) of the expected value ¥,[r'^] as a function of tp for different values of c (see 
panels), and r. The solid lines represent the value of T) associated with the usual Wright-Fisher 
model. 
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Figure 7: The prediction 1) of linkage disequilibrium obtained from the ARG associated with the 
Beta(T?,7) dist. The different lines represent different values of 7 (upper panels) or 'd (lower panels). 
The broken horizontal line represents the prediction obtained from the usual ARG. 
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Figure 8: The expected variance of pairwise differences for sample size 50 as a function of the 
recombination rate r for different values of the parameters c, tp, and 9 as shown. 
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Figure 9: The expected variance of pairwise differences as a function of sample size for different 
values of the parameters c, ^l^, r, and 6 as shown. 
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